A Guide to Retrievability on Filecoin

archived 24 May 2025 01:38:19 UTC
//
🪃
A Guide to Retrievability on Filecoin
Get Notion free
🪃 Page icon🪃 Page icon

A Guide to Retrievability on Filecoin

Empty
luca
luca
2025/01/30
March 26, 2025 3:04 PM
Protocol Research
5 more properties
Luca, January 2025

🌐 Introduction

This document aims to be the entry point for anyone interested in Retrievability on Filecoin. Each section presents one aspect of retrievability, declined in the context of web3 and, specifically, of Filecoin Network.
Filecoin is the largest decentralized storage network to date, boasting over 4 EiB of raw byte capacity and approximately 120 PiB of user data stored (see here, keeping in mind that FIL+ deals have a 10x QAP multiplier).
For any storage network to achieve long-term success, it must ensure reliable data retrieval. In the case of Filecoin, retrievability refers to the ability to reliably access and retrieve files stored by Storage Providers (SPs). Unlike traditional cloud storage systems, where data retrieval is guaranteed by a centralized provider, Filecoin is a decentralized system. This introduces unique challenges related to network performance, data redundancy, SP reliability, and incentives.
This document explores the various strategies and protocols available within Filecoin to enhance data retrievability, along with their guarantees and limitations. It also gives an overview of different payments strategies and ways SPs and Clients can put in place in order to select each other.
Moreover, it presents a range of ideas for potential improvement in the context of retrievability which could be explored at protocol design level.
Each section ends with a summary table which aims to give a bird eye view of each section.
All the tables can be found in 📊 List of Summary Tables.

👩🏼‍🎨 Retrievability on Filecoin: State of the Art

Filecoin enables decentralized storage, where clients store data with Storage Providers (SPs) and later retrieve that data on demand. However, unlike storage, which is provable (via Proof of Replication and Proof of SpaceTime), retrieval is a separate process that isn’t always provable. It depends on protocols and strategies that address different challenges.
Retrievability is influenced by several factors, including:
Network and SP performance
Data availability
Retrieval optimization protocols
Filecoin offers various flavors of retrievability—strategies and protocols that help ensure data can be retrieved when needed. These range from redundancy models to retrieval networks and off-chain solutions.

🤸🏼 Retrievability Options on Filecoin

1. Basic Retrieval with Single SP

Description: A client stores data with a single SP and retrieves it directly from that SP when needed. This is the most basic form of retrieval.
Challenges: If the SP becomes unavailable (due to network failure, maintenance, or other issues), retrieval will fail.
Use Case: Suitable for non-critical data where high availability isn’t a priority.
Guarantee: No guarantee of successful retrieval; retrieval success is solely dependent on the SP’s infrastructure and reliability.
Status: Live

2. Retrieval with Data Redundancy (Replication)

Description: Data replication involves storing multiple copies of data across different SPs to ensure availability. This increases the likelihood of successful retrieval even if one SP is down.
Challenges: Replication increases storage costs and complexity, particularly in deal management.
Use Case: Ideal for critical data where retrieval success is more important than cost.
Guarantee: Increased retrieval reliability but not a full guarantee. The more replicas, the higher the chances of retrieval success, but retrieval still depends on an SP willing and able to serve the request.
Status: Live

3. Retrieval with off-chain Service-Level Agreements (SLAs)

Description: Some SPs offer Service Level Agreements (SLAs), which define specific guarantees regarding retrieval speed, availability, and reliability. SLAs can cover (a subset of) features like response times, uptime commitments, availability, etc.
Challenges: SLAs can only guarantee performance up to the terms agreed upon. The SP’s infrastructure must be trusted to meet the SLA’s conditions. Moreover, at the current stage SLAs require explicit costs to be formalized (involving additional costs for lawyers’ support).
Use Case: Suitable for businesses or clients requiring more predictable and reliable retrieval performance for mission-critical applications.
Guarantee: Performance guarantees (speed, availability) per the SLA, but no 100% retrieval guarantee.
Status: Live

4. Filecoin’s Spark Protocol Reputation System

Description: The Spark protocol enhances retrieval performance by building a reputation system based on retrieval success rates. SPs are evaluated based on their past retrieval performance, such as success rates and latency. Clients are directed to SPs with the best track records.
Challenges: Spark improves retrieval likelihood but doesn’t guarantee successful retrieval for every request. It only samples file retrieval performance and does not guarantee full-file availability upon request. Additionally, Spark currently supports limited sampling, which may not cover all file types or be frequent enough. A broader list of callouts/concerns (maintained by the Spark team) is here
Use Case: Useful for clients seeking improved retrieval speed and confidence in data access.
Guarantee: Higher likelihood of successful retrieval, but no 100% guarantee. Focuses on improving retrieval performance through incentive-based SP selection.
Status: Live, being improved

5. Proof of Data Possession (PDP)

Description: Proof of Data Possession (PDP) ensures that a storage provider still has access to the data they claim to store, verified periodically. While this doesn't guarantee retrieval at the moment of request, it proves that the data is still available in an accessible form.
Challenges: PDP guarantees data possession but does not address retrieval performance (e.g., latency or availability issues). It ensures the data is available in an “unencoded” format but doesn’t verify the infrastructure required to deliver fast retrieval.
Use Case: Ideal for long-term storage verification, where clients want proof that their data is intact and periodically accessible.
Guarantee: Data possession guarantee in a retrievable form, but no guarantee of successful retrieval on demand.
Status: Testnet

6. Third Party Backup and Caching

4
Description: third party backups involve storing copies of critical data in centralized/third-party systems, such as cloud storage. Caching can also be used to store frequently accessed data closer to users, improving retrieval speed and reliability.
Challenges: Off-chain storage adds operational complexity and costs. Additionally, using centralized systems somewhat undermines the decentralized ethos of Filecoin.
Use Case: For clients requiring high availability for critical data, combining Filecoin with “non Filecoin” backups storage provides a safety net if Filecoin retrieval fails.
Guarantee: Full retrieval guarantee from off-chain backups, but it is not decentralized.
Status: Doable (this alternative it is external to Filecoin and can be pursued with any centralized/third part system)

7. Content Delivery Networks (CDNs) Gateway

Description: CDNs offer the only current method of rational guarantees for data retrieval by enforcing penalties or fault fees if data cannot be retrieved. A committee is invoked if a client is unable to retrieve a file directly. Retriev.org is one such example, where external arbitration is used to ensure data delivery.
Challenges: Integrating CDNs with Filecoin requires external solutions, which introduce additional costs (retrieval service cost and collateralization). CDNs may also require complex multi-round protocols and coordination among committee members, potentially impacting retrieval speed.
Use Case: Ideal for clients needing strong retrievability guarantees and who can afford the added costs and protocol complexity.
Guarantee: Retrieval is rationally guaranteed via arbitration and penalties for failure.
Status: POC (See retriev.org)

📚 Different Guarantees for Different Client Needs

Retrievability on Filecoin is not a one-size-fits-all solution. The best strategy depends on the specific needs of the client, including the importance of retrieval speed, reliability, and cost constraints.
Clients may opt for simple retrieval from a single SP for non-critical data or more complex redundancy, SLA agreements, or off-chain backups for mission-critical files requiring high availability. Advanced protocols like Spark or CDN integration offer higher performance and reliability, but they come at an added cost or complexity.
Choosing the right solution is key to ensuring a balance between cost, performance, and reliability in the Filecoin ecosystem.

On Trust Challenges and CDN-like Solutions

In decentralized storage networks like Filecoin, at the current stage trust is crucial for ensuring the fulfillment of retrievability commitments. A lack of trust (or simply dishonest behaviors) can lead to issues such as retrieval failures or delays, especially with some payment strategies like upfront payment (See below for a detailed overview).
Introducing a CDN-like solution (e.g., retriev.org) could address these trust challenges by:
Providing monitoring services to ensure SPs meet obligations.
Offering arbitration mechanisms to resolve disputes and penalize non-performing SPs.
Ensuring that retrieval promises are backed by financial incentives and penalizations.

📊 Summary Table

Retrievability Option
Description
Challenges
Guarantee
Trust Assumptions
1. Basic Retrieval with Single SP
Retrieve data from a single SP.
If SP becomes unavailable, retrieval fails.
No guarantee; depends entirely on SP reliability.
Trust SP infrastructure and reliability for retrieval.
2. Retrieval with Data Redundancy (Replication)
Data stored with multiple SPs for redundancy.
Increases storage cost and complexity.
Higher retrieval reliability, not a full guarantee.
Trust SPs to serve data; risk of failure if all replicas are down.
3. Retrieval with off-chain SLAs
SPs offer SLAs with performance guarantees (e.g., speed, availability).
Limited to terms agreed in SLA; trust in SP’s infrastructure.
Performance guarantees as per SLA, but not 100% retrieval.
Trust SP to fulfill SLA conditions; dependent on SP infrastructure.
4. Spark Protocol for Optimized Retrieval
Reputation-based system optimizing retrieval by evaluating SPs.
No guarantee of full file availability; relies on past performance.
Higher likelihood of successful retrieval, but no 100% guarantee.
Trust the historical data and reputation system and the way it is implemented (checker node and SP do not collude); SPs’ performance history.
5. Proof of Data Possession (PDP)
Verifies that SP has access to the data.
Doesn’t ensure retrieval speed or availability; only data possession.
Guarantee of data possession, but no retrieval guarantee.
Trust in SP’s possession of data; no guarantee on retrieval.
6. Off-Chain Backup and Caching
Backup and cache data in centralized or third-party systems.
Increases operational complexity and costs; not decentralized.
Full retrieval guarantee from off-chain backups.
Trust centralized systems for retrieval; undermines Filecoin's decentralization.
7. CDN Gateway
Use of CDNs with arbitration and penalties to guarantee retrieval.
Potentially adds external costs and protocol complexity; impacts retrieval speed.
Rational retrievability guarantee via CDN arbitration.
Honest majority of the CDN committee providing the service

✨ Key Metrics

When evaluating retrievability, clients need to consider several performance metrics that measure different aspects of the data retrieval process. These metrics ensure that data is not only stored but also accessible and retrievable efficiently when required. Below is a comprehensive list of key retrievability metrics, grouped by their category:

1. Availability Metrics

These metrics measure the likelihood of data being accessible when requested, as well as how quickly data can be recovered when issues arise.
Data Availability: Measures if the requested data is available for retrieval at the time of request. This is critical for ensuring continuous access to data, especially for mission-critical applications.
Redundancy (Replica Availability): Refers to the number of data replicas stored across different storage providers or within the same provider, ensuring data retrieval success even when some replicas are unavailable. This is essential for high-availability scenarios.
Time to Data Recovery: Measures the duration it takes to recover data after it has been lost or corrupted. Fast recovery times are crucial for minimizing disruptions caused by data loss or unavailability.

2. Performance Metrics

These metrics assess the speed and responsiveness of the retrieval process, directly impacting the user experience.
Retrieval Speed (Throughput): Indicates the rate at which data can be retrieved, usually measured in megabytes per second (MB/s) or gigabytes per second (GB/s). It is critical for applications requiring real-time or near-real-time data access, such as streaming services.
Time to First Byte (TTFB): Measures the time from when the retrieval request is made to when the first byte of data is received. A low TTFB is essential for a smooth user experience, particularly in websites or media streaming platforms.
Retrieval Latency: Measures the delay between initiating a retrieval request and receiving the entire data file. Latency is influenced by factors like network congestion, storage provider performance, and geographic distance. It is critical for time-sensitive applications like gaming and video conferencing.

3. Reliability Metrics

These metrics reflect the consistency, stability, and success rate of data retrieval.
Retrieval Success Rate: Tracks the percentage of successful retrieval attempts, highlighting the reliability of the retrieval service. A low success rate could indicate issues with storage provider availability or network instability.
Uptime and Reliability: Measures the percentage of time a storage provider’s system is operational and capable of serving retrieval requests. High uptime is essential to minimize service interruptions and ensure continuous access to data.
Error Rate: Indicates the frequency of failed retrieval attempts or instances of corrupted data. A low error rate is important for maintaining data integrity and system reliability.
Data Integrity: Ensures the retrieved data is correct, consistent, and unmodified. This is essential for sensitive or regulatory-compliant data, where corruption or tampering could have serious consequences.

4. Cost-related Metrics

These metrics help clients optimize the retrieval process in a way that balances performance with cost efficiency.
Cost Efficiency of Retrieval: Evaluates the total cost involved in the retrieval process, including transaction fees, network bandwidth usage, and any service fees charged by the storage providers. This is important for clients looking to optimize storage costs.
Network Bandwidth Usage: Measures the amount of bandwidth consumed during the retrieval process, including both upload and download bandwidth. Efficient use of bandwidth can help minimize retrieval costs and avoid network congestion, especially for clients with bandwidth limitations.

5. Quality Metrics

These metrics measure the overall quality of the retrieval process, ensuring a satisfactory user experience.
Quality of Service (QoS): A comprehensive metric that combines factors like speed, reliability, consistency, and packet loss to provide a global measure of the overall experience of data retrieval. A high QoS is essential for services where user satisfaction is key, such as media-heavy applications or customer-facing services.
Data Consistency: Measures how well the data returned during retrieval matches the original data, ensuring it is up-to-date and accurate. This is especially important for systems requiring real-time access to the most accurate version of data, such as real-time analytics and live updates.
Geographic Proximity: Refers to how close the storage provider or data replicas are to the client’s location. Geographically proximate storage reduces latency and enhances the speed and reliability of data retrieval, which is crucial for globally distributed applications.

Interrelationships and Impact

Availability Metrics: Directly affect the retrieval success rate (RSR). High data availability and redundancy ensure that data can be retrieved even if some systems or replicas are down.
Performance Metrics: Influence retrieval speed and latency, both of which affect the overall user experience. For example, TTFB and throughput are tied to factors like network bandwidth usage and data availability.
Reliability Metrics: Impact retrieval success rate, error rate, and data integrity. High uptime ensures high availability, while a low error rate guarantees data accuracy and successful retrieval attempts.
Cost-related Metrics: Help clients balance performance with costs. High network bandwidth usage and retrieval costs can make data access expensive, especially when retrieval speed and latency are prioritized.
By grouping the metrics into these categories, clients can evaluate retrievability from multiple dimensions, ensuring efficient, reliable, and cost-effective data access.

📊 Summary Table

Category
Metric
Description
Level of Impact
Availability Metrics
Data Availability
Measures if data is accessible at the time of request.
High
Redundancy (Replica Availability)
Tracks the availability of data replicas, increasing retrieval success even if some are unavailable.
High
Time to Data Recovery
Measures time to recover lost or corrupted data.
High
Performance Metrics
Retrieval Speed (Throughput)
Indicates the speed at which data can be retrieved, typically measured in MB/s or GB/s.
High
Time to First Byte (TTFB)
Measures time from request to the receipt of the first byte of data.
High
Retrieval Latency
Measures delay between initiating a retrieval request and receiving the full data file.
High
Reliability Metrics
Retrieval Success Rate
Tracks the percentage of successful retrieval attempts.
Critical
Uptime and Reliability
Measures the operational time of the storage provider’s system and failure frequency.
Critical
Error Rate
Indicates the frequency of retrieval failures or corrupted data.
Critical
Data Integrity
Ensures that retrieved data is correct, consistent, and unmodified.
High
Cost-related Metrics
Cost Efficiency of Retrieval
Evaluates the total costs incurred during the retrieval process, including fees and bandwidth.
Moderate
Network Bandwidth Usage
Measures the bandwidth consumed during the retrieval process, including upload and download.
Moderate
Quality Metrics
Quality of Service (QoS)
Measures the overall retrieval experience, including factors like speed, reliability, and packet loss.
High
Data Consistency
Measures how well retrieved data matches the original, ensuring it is up-to-date and accurate.
Moderate to High
Geographic Proximity
Indicates the proximity of the storage provider or data replica to the client’s location.
Moderate

💳 Payment Options

Filecoin offers various payment methods for retrievability, allowing clients to access and retrieve their data from the network. Each option has its own set of benefits, challenges, and trade-offs, enabling clients to select the most suitable route based on their needs and preferences. Below is an overview of the primary payment methods available for retrievability in Filecoin:

1. Off-Chain Payments

Description: Off-chain payments for retrievability occur outside the Filecoin blockchain. These transactions are typically conducted using traditional payment systems. Payments are processed via third-party services or directly between clients and storage providers, often governed by traditional contracts rather than blockchain-based agreements.
Why it matters: While off-chain payments provide familiarity and ease of use, they also introduce certain limitations regarding decentralization and transparency.
Pros:
Familiarity: Clients are often comfortable using common payment methods, like credit cards or digital wallets, which are widely accepted.
Speed: Transactions are processed without the need for blockchain confirmations or gas fees.
Lower Fees: Without the need for blockchain gas fees, off-chain payments generally incur lower transaction costs.
No price fluctuation: Parties are not exposed to tokens price fluctuation.
Cons:
Not Web3 Native: Off-chain payments are not fully aligned with the Web3 ethos, relying on centralized intermediaries and traditional payment systems instead of the decentralized Filecoin network.
Lack of Transparency: These transactions are not recorded on the blockchain, making it harder to audit or verify the payment history.
Centralization: The involvement of external third parties introduces a layer of centralization, potentially undermining the decentralized nature of Filecoin.
Challenges:
Regulation and Legal Disputes: Off-chain transactions are often governed by traditional contracts, which can lead to legal complexities, especially in disputes or chargebacks.
Limited Automation: Since off-chain payments cannot be integrated with Filecoin’s decentralized ecosystem, clients miss out on the benefits of automated smart contracts for retrievability.

2. On-Chain Payments with Filecoin’s Native Token (FIL)

Description: Clients can pay for retrievability using FIL, Filecoin’s native cryptocurrency. Payments are processed directly on the Filecoin blockchain using smart contracts, offering a decentralized, secure, and transparent payment method.
Why it matters: FIL payments ensure seamless integration with the Filecoin network and its decentralized architecture, but they also come with certain challenges related to price volatility and network congestion.
Pros:
Decentralized and Transparent: Payments are securely recorded on the blockchain, ensuring a transparent, immutable record of all transactions.
Smart Contract Integration: FIL payments can be automated through smart contracts, enabling trustless and self-executing transactions for retrievability.
Global Accessibility: FIL can be accessed by anyone with access to the Filecoin network, regardless of geographic location or traditional banking systems.
Cons:
Transaction Fees: FIL payments incur gas fees, which can fluctuate based on network demand. These fees may be higher during periods of congestion.
Volatility: FIL’s price volatility means that the cost of retrievability can change, making it harder to predict retrieval costs in terms of fiat currency.
Network Congestion: During high-demand periods, on-chain transactions may experience delays or increased costs, impacting the overall speed and reliability of retrievability.
Challenges:
Conversion to FIL: Clients unfamiliar with cryptocurrencies may find it difficult to convert fiat currencies into FIL, adding friction to the payment process.
Scalability Issues (minor, at the current stage): As the Filecoin network grows, higher transaction volumes could lead to slower confirmation times or increased costs, especially if the network is not fully optimized for large-scale transactions.

3. On-Chain Payments via Filecoin Stablecoin (When Available)

Description: Once introduced, clients can pay for retrievability using Filecoin’s stablecoin. This method combines the benefits of on-chain payments with the price stability typically associated with stablecoins, offering an alternative to FIL for retrievability transactions.
Why it matters: Stablecoins offer predictable costs for retrievability, making it easier for clients to forecast their expenses while still benefiting from the transparency and security of blockchain-based payments.
Pros:
Price Stability: Stablecoins maintain a fixed value, which reduces the volatility seen with FIL and makes it easier to predict costs in fiat terms.
Blockchain Security: Like FIL, stablecoin payments are securely recorded on the blockchain, offering a tamper-proof transaction history.
Integration with Smart Contracts: Stablecoin payments can be seamlessly integrated into Filecoin’s decentralized ecosystem, allowing for automated, efficient retrieval transactions.
Cons:
Centralization Risks: Stablecoins are often issued by centralized entities, meaning their value can be influenced by regulatory decisions or the performance of the issuer. This introduces some risk, unlike purely decentralized cryptocurrencies like FIL.
Liquidity and Conversion: Depending on the adoption of the stablecoin, liquidity might be limited, and clients may need to convert other assets into the stablecoin before making payments.
Regulatory Scrutiny: Stablecoins face increasing regulatory attention in many regions, which could affect their usability or availability for retrievability payments.
Challenges:
Adoption and Support: Adoption of stablecoins could be slower in certain markets, which may limit their availability as a payment method for retrievability.
Regulatory Uncertainty: Legal and regulatory changes surrounding stablecoins could impact their use for retrievability payments, especially in regions with more stringent crypto regulations.

3. On-Chain Payments via ERC 20

Description: In addition to FIL and stablecoins, clients can pay for retrievability using ERC-20 tokens. These are tokens built on the Ethereum blockchain that follow the ERC-20 standard, allowing them to be used across various decentralized applications (dApps) and blockchains. Payments with ERC-20 tokens enable a broader range of cryptocurrencies to be used within the Filecoin network for retrievability.
Why it matters: ERC-20 tokens provide a flexible and interoperable payment method, allowing clients to leverage existing Ethereum-based assets or tokens for transactions within the Filecoin ecosystem.
Pros:
Interoperability: ERC-20 tokens can be seamlessly used across multiple blockchain platforms, enabling easy integration with Ethereum’s extensive ecosystem.
Variety of Tokens: Clients can use a wide variety of ERC-20 tokens for payments, such as stablecoins, project-specific tokens, or even widely adopted cryptocurrencies like USDT, DAI, or UNI.
Decentralization: Like other on-chain payments, ERC-20 transactions benefit from the transparency, security, and immutability of the blockchain.
Integration with dApps: ERC-20 tokens are already well-supported across decentralized applications, enabling easy integration with smart contracts for retrievability.
Cons:
Gas Fees: Just like FIL, ERC-20 token transactions incur gas fees on the Ethereum network. These fees can be high during times of network congestion and are often unrelated to the value of the transaction itself.
Cross-Chain Limitations: While ERC-20 tokens are widely accepted on Ethereum-based platforms, using them within Filecoin may require additional cross-chain compatibility mechanisms, such as bridges or wrapped tokens, which can introduce additional complexity or delays.
Volatility: Some ERC-20 tokens, especially those not pegged to fiat currencies, can be volatile, which can complicate cost predictions for retrievability.
Challenges:
Network Congestion: Ethereum’s network can sometimes become congested, especially during high-demand periods, leading to slower transaction speeds and higher gas fees for ERC-20 token transactions.
Cross-Chain Compatibility: Interoperability between Filecoin and Ethereum can sometimes be challenging, requiring robust solutions to ensure smooth transactions without introducing security risks or delays.
Complexity: Users unfamiliar with the Ethereum ecosystem may face a learning curve in managing ERC-20 token transactions, including handling wallet setups, gas fees, and token conversions.

📊 Summary Table

Payment Method
Pros
Cons
Challenges
Off-Chain Payments
Familiar, quick, lower fees
Not Web3 native, lacks transparency, centralized
Legal disputes, limited automation
On-Chain Payments (FIL)
Decentralized, transparent, smart contract integration
Transaction fees, price volatility, network congestion
Conversion to FIL, scalability issues
On-Chain Payments (Stablecoin)
Price stability, secure, smart contract integration
Centralization risks, liquidity, regulatory scrutiny
Adoption and support, regulatory uncertainty
On-Chain Payments (ERC-20 Tokens)
Interoperability, variety of tokens, decentralized, dApp integration
Gas fees, cross-chain limitations, volatility
Network congestion, cross-chain compatibility, complexity

🦾 Payment Strategies

In decentralized storage systems like Filecoin, clients and Storage Providers (SPs) need to agree on payment strategies for retrievability. These strategies offer a range of trade-offs in terms of cost, flexibility, and trust. Below is a breakdown of the primary payment strategies for retrievability.

1. Upfront Payment

Description: Clients make a one-time payment upfront, either when the data is stored or before making any retrieval requests. This guarantees access to data and helps secure the SP's commitment to data availability.
Pros:
Guaranteed Payment: SPs are assured payment, reducing financial risks.
Predictable Costs: Clients know the exact cost, helping with long-term budgeting.
Reliability: Since the SP has already been paid, they are incentivized to maintain data availability (when behaving honestly).
Cons:
Risk for Clients: Clients face the risk of paying upfront without a guarantee that the SP will fulfill retrieval requests.
High Initial Payment: The large upfront cost can be a barrier for clients with large datasets or limited capital.
No “Pay For Use”: This approach diverges from the pay for use approach cloud providers offer and many type of users consider the norm (being quite fair on both sides).
Challenges and Mitigations:
Trust Issues: Clients need to trust that SPs will fulfill retrieval promises. Introducing escrow systems (like CDNs or other) or reputation-based models can mitigate this risk.
Level of Trust: High trust required. As we said, clients must trust that the SP will deliver as promised. Implementing escrow mechanisms or a reputation system can reduce the trust burden.

2. Pay-to-Retrieve (Pay-as-you-go)

Description: Clients pay only when they retrieve data. Payment can be fixed, or based on a set of parameters agreed upfront (for instance: volume of data, retrieval speed, congestion, others… at the time of the request).
Pros:
Low Upfront Risk for Clients: Clients only pay when retrieval occurs, making it cost-effective for those with infrequent or unpredictable retrieval needs.
Flexibility: Clients avoid high initial payments.
Cons:
Uncertainty for SPs: SPs may hesitate to offer reliable retrieval services due to uncertain payment schedules. SPs are somehow at risk of not being paid if a client is not behaving honestly.
Higher Retrieval Costs: Costs can spike during times of high demand or congestion, depending on the payments terms agreed.
Challenges and Mitigations:
Trust Issues: Without a trusted intermediary (either centralized or implemented by a CDN), clients risk non-fulfillment of requests (and SP risk missing payments).
Level of Trust: Moderate to high trust. Clients must trust SPs to fulfill requests once payment is made (and SP must trust payments is placed when the data is served to the client). Implementing reputation systems or services like retriev.org can help mitigate these concerns.

3. Periodical Payment (Unlimited Retrieval)

Description: Clients pay a fixed amount periodically (e.g., monthly or annually subscription) for unlimited retrievals within that period, ensuring ongoing access to data.
Pros:
Predictable and Recurring Payments: Ideal for clients with ongoing data needs.
Guaranteed Data Availability: Clients are assured that they can retrieve data anytime during the subscription period (assuming SPs are behaving honestly).
Cons:
Risk of Overpayment: Clients may end up paying more than they actually use, especially if they don't retrieve data regularly.
Reduced Incentives for SPs: SPs get paid regardless of data retrieval, potentially reducing the motivation to optimize services.
Challenges and Mitigations:
Misalignment of Incentives: Introducing performance-based bonuses or penalties for SPs can help align their interests with clients’ needs.
Flexibility: Offering the ability to adjust plans based on actual usage could mitigate overpayment.
Level of Trust: Moderate trust. Clients must trust that SPs will fulfill their retrieval obligations over time. Using reputation systems and contractual enforcement or integrating a CDN based fallback mechanism can reduce risks.

4. Retrievability Tickets

Description: Clients purchase a set number of retrievability tickets in advance (either directly from SPs or from CDNs/Onramps, depending on the SP selection strategy they decide to pursue, see Deal-Making Process for further details), each ticket granting a specific amount of data retrieval. Once tickets are consumed, clients buy more.
Pros:
Flexible and Predictable: Clients can budget based on expected retrieval needs.
Transparent Costs: It’s clear how much clients are paying for each retrieval.
Cons:
Limited Flexibility: Number of retrievals are tied to ticket availability.
Unused Tickets: If tickets aren’t used within a certain period, clients may lose their value.
Challenges and Mitigations:
Ticket Expiration: To address potential waste, offer non-expiring tickets or partial refunds for unused tickets.
Dynamic Pricing: Adjust ticket pricing based on data retrieval demand to make the model more flexible.
Level of Trust: Moderate trust. Clients trust that the ticket system will be honored, though they still need to rely on the SP to fulfill the retrieval requests.

5. Hybrid Payments

Description: A combination of upfront payment and pay-as-you-go models, where clients pay a base amount upfront for guaranteed access and additional fees for any extra retrievals.
Pros:
Balanced Risk and Flexibility: Clients get predictability for core retrievals, with flexibility for additional requests.
Risk Sharing: Both the client and SP share the risk—clients pay a base price while SPs are incentivized to offer reliable additional retrievals.
Cons:
Complexity: This model can be more complicated to manage and understand for both parties.
Potential Overpayment: If the upfront payment is too high or retrievals are infrequent, clients may end up paying more than necessary.
Challenges and Mitigations:
Clarity: Clear communication about what the upfront payment covers versus additional retrieval costs will reduce confusion.
Monitoring: Clients should be able to track usage and adjust payments accordingly.
Level of Trust: Moderate trust. Clients must trust that both the upfront payment and variable components of the strategy will be honored by the SP. Reputation systems and CDNs can help enforce this.

📊 Summary Table

Payment Strategy
Description
Pros
Cons
Challenges
Level of Trust
Upfront Payment
Clients pay in advance for guaranteed retrieval access.
Guaranteed payment for SP; predictable costs for clients.
High initial cost; risk if SP fails to deliver.
Risk of overpayment; reliance on reputation systems/contracts/CDNs.
High trust (Client trusts SP’s commitment).
Pay-to-Retrieve
Clients pay when retrieval occurs, based on volume and speed.
Flexible, low upfront costs.
Uncertainty for SP; potentially higher retrieval costs.
Risk of failure to fulfill requests; need for reputation systems/contracts/CDNs.
Moderate to high trust (Client trusts SP to deliver).
Periodical Payment
Clients make recurring payments for unlimited retrievals.
Predictable, steady costs; guaranteed availability.
Risk of overpayment; reduced SP incentive.
Misaligned incentives; need for performance-based bonuses.
Moderate trust (Trust in long-term reliability).
Retrievability Tickets
Clients buy tickets in advance for specific retrievals.
Flexible, predictable costs.
Limited flexibility; unused tickets may expire.
Ticket expiration; dynamic pricing needed.
Low to moderate trust (Client trusts ticket system is honored and eventually stop buying tickets).
Hybrid Strategy
Combination of upfront payment and pay-as-you-go.
Balances flexibility and security.
Complexity; risk of overpayment.
Clarity in terms; need for smart contract enforcement.
Moderate trust (Client trusts both upfront and variable components).

💾 Retrieval Services: SP Selection Strategies

Selecting Storage Providers (SPs) for retrievability in Filecoin involves balancing control, cost, reliability, and trust. We identify two important categories of the selection task: the deal-making process and the selection mechanism.

Deal-Making Process

The deal-making process involves how the client interacts with the storage provider to agree on terms and initiate the retrieval. There are two key alternatives to consider:
1.
Direct Negotiation: In this scenario, the client and SP engage directly to establish the terms of the retrieval deal. This method gives the client full control over the selection process but requires his direct effort to assess and negotiate terms like cost, location, and performance guarantees. It also carries risks, such as potential unreliability of the SP or high overhead for the client.
2.
Automated or Delegated Deal-Making: This approach involves intermediaries or automation, such as content delivery networks (CDNs), auction systems, or smart contracts that facilitate or automatically execute the deal-making process. These methods reduce the client's manual involvement and can help ensure better terms by integrating performance metrics, market-based competition, or decentralized contract execution. However, they may come with additional costs, reduced control, or dependence on third parties for dispute resolution or performance verification.

Selection Mechanisms

Once the deal-making process is determined, the actual selection mechanism defines how the SPs are chosen. These can range from reputation-based systems to auction-based or automated selections.

1. Reputation-Based System

1
Description: SPs are selected based on historical performance metrics (e.g., retrieval success rates, uptime). Clients rely on reputation scores (e.g., Spark protocol).
Pros:
Helps ensure reliable and fast retrievals
Mitigates the risk of choosing unreliable SPs
Cons:
Reputation scores may fluctuate, leading to unpredictability
Relies on honest data, which could be manipulated
Trust Issues:
Reputation can be tampered with unless verified via cryptographic proofs or trusted oracles.

2. Auction-Based Selection

1
Description: SPs bid to handle the client’s retrieval request. The client selects based on the best offer (e.g., lowest price, best service).
Pros:
Competitive pricing, potentially lowering costs
Flexibility in selecting based on price and service quality
Cons:
Risk of race-to-the-bottom, where SPs cut corners to offer lower prices
Complex auction setup and bidding process
Trust Issues:
Clients must trust that the auction process is fair and transparent.

3. Automated SP Selection via Smart Contracts

Description: Smart contracts automate SP selection based on predefined criteria (e.g., price, performance, reputation). They can use oracles or on-chain data for execution.
Pros:
Eliminates manual selection
Ensures up-to-date, relevant data for SP selection
Cons:
Limited flexibility after contract deployment
Relies on accurate off-chain data from oracles
Trust Issues:
Trust in smart contract security and off-chain data accuracy.

4. Content Addressing and IPFS Retrieval

Description: Clients retrieve data based on a content identifier (CID), mapped to specific SPs (or a set of SPs) storing the content (where this mapping is stored in an IPNI like cid.contact or a DHT like the Amino DHT). This can be done using IPFS or compatible systems.
Pros:
No third-party intermediaries needed
Simple retrieval if data is available through IPFS
Cons:
1
Lack of dynamic selection based on retrieval conditions (e.g., cost, SP performance)
Potential availability issues if the CID isn’t widely replicated
Trust Issues:
Risk of retrieval failure if content is not replicated sufficiently across SPs.

5. Service-Level Agreement (SLA) Based Selection

1
Description: SPs offer a Service-Level Agreements (SLAs), providing guarantees for (a subset of) retrieval speed, availability, and quality.
Pros:
Clear, enforceable guarantees for performance.
Financial penalties for missed SLA targets.
Cons:
SLAs often come at a higher cost.
Enforcing SLAs in a decentralized environment can be complex.
Additional costs (i.e. lawyers support) when writing/evaluating/signing the SLA.
Trust Issues:
Clients need to trust that SLAs will be respected, and enforcement mechanisms will be effective.

6. Redundancy-Based Selection (Replication)

1
Description: Multiple SPs are picked to replicate data, ensuring that if one SP fails, another can deliver the data.
Pros:
Increased reliability and availability
Reduces the risk of retrieval failures
Cons:
Higher storage costs due to replication
Coordination required to ensure availability of multiple SPs
Trust Issues:
Clients must trust that replicated copies are accurate and available when needed.

📊 Summary Tables

Deal-Making Process
Description
Pros
Cons
Direct Negotiation
The client and SP engage directly to establish terms for the retrieval deal. The client has full control over selection, negotiating factors like cost, location, and performance guarantees.
- Full control over the selection process - Customizable terms based on specific needs- Direct interaction with SP
- Requires manual effort and overhead - Potential for selecting unreliable SPs - Time-consuming and may be less efficient
Automated or Delegated Deal-Making
Involves intermediaries or automation like CDNs, auction systems, or smart contracts that facilitate or automate the deal-making process, reducing client effort.
- Reduces manual effort- Better terms through performance metrics and competition- Decentralized contract execution (via smart contracts)
- Additional costs (e.g., intermediary fees) - Less control over selection process - Dependence on third parties for dispute resolution or performance verification
SP Selection Mechanism
Description
Pros
Cons
Trust Issues
Reputation-Based System
SPs are selected based on historical performance metrics (e.g., retrieval success rates, uptime). Clients rely on reputation scores (e.g., Spark protocol).
- Helps ensure reliable and fast retrievals- Mitigates the risk of choosing unreliable SPs
- Reputation scores may fluctuate, leading to unpredictability- Relies on honest data, which could be manipulated
- Reputation can be tampered with unless verified via cryptographic proofs or trusted oracles
Auction-Based Selection
SPs bid to handle the client’s retrieval request, and the client selects based on the best offer (e.g., lowest price, best service).
- Competitive pricing, potentially lowering costs- Flexibility in selecting based on price and service quality
- Risk of race-to-the-bottom, where SPs cut corners to offer lower prices - Complex auction setup and bidding process
- Clients must trust that the auction process is fair and transparent
Automated SP Selection via Smart Contracts
Smart contracts automate SP selection based on predefined criteria (e.g., price, performance, reputation). Can use oracles or on-chain data for execution.
- Eliminates manual selection- Ensures up-to-date, relevant data for SP selection
- Limited flexibility after contract deployment - Relies on accurate off-chain data from oracles
- Trust in smart contract security and off-chain data accuracy.
Content Addressing and IPFS Retrieval
Clients retrieve data based on a content identifier (CID), mapped to specific SPs storing the content (using IPFS or compatible systems).
- No third-party intermediaries needed- Simple retrieval if data is available through IPFS
- Lack of dynamic selection based on retrieval conditions (e.g., cost, SP performance) - Potential availability issues if the CID isn’t widely replicated
- Risk of retrieval failure if content is not replicated sufficiently across SPs.
Service-Level Agreement (SLA) Based Selection
SPs offer Service-Level Agreements (SLAs) providing guarantees for retrieval speed, availability, and quality.
- Clear, enforceable guarantees for performance- Financial penalties for missed SLA targets
- SLAs often come at a higher cost- Enforcing SLAs in a decentralized environment can be complex
- Clients need to trust that SLAs will be respected, and enforcement mechanisms will be effective.
Redundancy-Based Selection (Replication)
Multiple SPs are selected to replicate data, ensuring that if one SP fails, another can deliver the data.
- Increased reliability and availability- Reduces the risk of retrieval failures
- Higher storage costs due to replication - Coordination required to ensure availability of multiple SPs
- Clients must trust that replicated copies are accurate and available when needed.

🧮 Retrieval Services: Client Selection Strategies

While Clients need to carefully select Storage providers, also Storage Providers (SPs) need to choose which clients' retrieval requests they will fulfill. Similarly to what happens for Sp selection by Clients (see the dedicated section above),this process can be broken down into two key components: the deal-making process and the selection mechanism.

Deal-Making Process

The deal-making process defines how the Storage Provider (SP) interacts with the client to establish the terms of the retrieval agreement. There are two primary methods for SPs to engage with clients:
1.
Direct Negotiation: In this scenario, the SP and the client engage in direct communication to negotiate the terms of the retrieval deal. The SP has full control over the terms, such as price, retrieval times, performance guarantees, and service conditions. This gives the SP flexibility to tailor the terms to their needs but may come with risks such as extended negotiation periods or the possibility of misunderstandings. It also demands more time and effort from the SP to handle each individual client’s needs.
2.
Automated or Delegated Deal-Making: In this approach, intermediaries or automated systems, such as smart contracts, content delivery networks (CDNs), or auction systems, handle the deal-making process on behalf of the SP. This reduces the manual workload for the SP and can optimize terms based on real-time data or market-driven forces, helping to secure competitive deals. However, this method introduces trade-offs such as reduced control, reliance on third-party systems, and potential additional costs related to intermediaries.
The deal-making process sets the stage for how the SP will structure the retrieval agreement with clients, with each approach offering a different balance of control, efficiency, and risk.

Client Selection Mechanisms

1. First-Come, First-Served (FCFS)

Description: Requests are processed in the order they are received.
Pros:
Simple and transparent, with no prioritization needed.
Cons:
Can be inefficient during peak demand; no differentiation between clients.
Trust Issues:
Client are trusted to be paying for retrievability service.

2. Reputation-Based Client Selection

Description: Priority is given to Clients with better reputation scores, based on past performance and payment history.
Pros:
Reduces risk by selecting reliable clients.
Cons:
New clients may be disadvantaged; reputation systems can be manipulated.
Trust Issues:
Selection depends on the reliability and security of the reputation system.

3. Long-Term Contracts / Service-Level Agreements (SLAs)

Description: Long-term contracts guaranteeing retrieval performance are offered in exchange for steady payments.
Pros:
Predictable revenue; stable relationships with clients.
Cons:
Limited flexibility; risk of underutilized capacity.
Trust Issues:
Requires both parties to trust the contract terms and enforcement.

4. Payment-Driven Client Selection

Description: Clients are prioritized based on how much they’re willing to pay for retrieval.
Pros:
Maximizes revenue by focusing on higher-paying clients.
Cons:
May exclude clients with lower budgets.
Trust Issues:
Requires trust in the fairness of the payment model.
Client are trusted to be paying for retrievability service.

5. Automated Algorithmic Client Selection

Description: Clients are algorithmically selected based on factors like price, reputation, and past performance.
Pros:
Efficient, objective, and consistent selection.
Cons:
Lack of flexibility if client needs change after the setup.
Trust Issues:
Requires trust in the algorithm’s fairness and transparency.

6. Redundancy-Based Client Selection

Description: Clients are prioritized based on data replication across multiple providers to ensure availability.
Pros: Increases reliability and reduces the risk of retrieval failure.
Cons: Higher storage costs and coordination required.
Trust Issues:
Replication is not per se ensuring actual retrievability

📊 Summary Tables

Deal Making Process
Description
Pros
Cons
Trade-Offs/Considerations
Direct Negotiation
SP and client negotiate terms directly.
- Full control over terms (price, performance, guarantees).
- Extended negotiation periods. - Potential for misunderstandings. - Time and effort required from SP.
- Flexibility vs. potential delays and risks.
Automated or Delegated Deal-Making
Intermediaries or automated systems (e.g., smart contracts, CDNs, auction systems) handle the process for SP.
- Reduces manual workload. - Optimizes terms via real-time data and market forces. - Helps secure competitive deals.
- Reduced control for the SP. - Reliance on third-party systems. - Potential additional costs.
- Efficiency and optimization vs. control and reliance on third parties.
Client Selection Mechanism
Description
Pros
Cons
Trust Issues
First-Come, First-Served (FCFS)
Requests are processed in the order they are received.
Simple, transparent, no prioritization needed.
Inefficient during peak demand, no differentiation between clients.
Clients are trusted to be paying for retrievability service.
Reputation-Based Client Selection
Priority is given to clients with better reputation scores based on past performance and payment history.
Reduces risk by selecting reliable clients.
New clients may be disadvantaged; reputation systems can be manipulated.
Relies on the security and reliability of the reputation system.
Long-Term Contracts / SLAs
Long-term contracts guaranteeing retrieval performance in exchange for steady payments.
Predictable revenue, stable relationships with clients.
Limited flexibility, risk of underutilized capacity.
Requires trust in the contract terms and enforcement by both parties.
Payment-Driven Client Selection
Clients are prioritized based on how much they are willing to pay for retrieval.
Maximizes revenue by focusing on higher-paying clients.
Excludes clients with lower budgets.
Requires trust in the fairness of the payment model; clients must be paying for retrievability.
Automated Algorithmic Client Selection
Clients are selected algorithmically based on price, reputation, and past performance.
Efficient, objective, and consistent selection.
Lack of flexibility if client needs change after setup.
Trust in the fairness and transparency of the algorithm.
Redundancy-Based Client Selection
Clients are prioritized based on data replication across multiple providers to ensure availability.
Increases reliability, reduces the risk of retrieval failure.
Higher storage costs and coordination required.
Replication doesn’t guarantee actual retrievability; requires trust in overall system.

👣 Next Steps

Retrievability guarantees for data stored on the Filecoin Network is essential for its long-term success and sustainability. However, we also recognize that each user may have distinct needs, preferences, and requirements when it comes to data accessibility and security guarantees.
To address this, we foresee a modular approach that allows users to select from a diverse range of services and combine them in a way that meets their specific retrievability and reliability goals. This flexibility will enable users to tailor their storage solutions to their unique use cases, ensuring both customization and scalability.
A promising path forward for enhancing retrievability guarantees on the Filecoin Network involves integrating advanced protocols and tools. By leveraging technologies and protocols like CDN Gateways, reputation systems, smart contract-powered storage solutions and incentives, we can create a more robust and reliable infrastructure.
These combined innovations will not only improve data accessibility and security but will also foster the overall growth and resilience of the Filecoin ecosystem.

📚 Additional Resources

Share site
🪃
A Guide to Retrievabilit…
Made with
🪃🪃
A Guide to Retrievabilit…
Luca, January 2025

🌐 Introduction

This document aims to be the entry point for anyone interested in Retrievability on Filecoin. Each section presents one aspect of retrievability, declined in the context of web3 and, specifically, of Filecoin Network.
Filecoin is the largest decentralized storage network to date, boasting over 4 EiB of raw byte capacity and approximately 120 PiB of user data stored (see here, keeping in mind that FIL+ deals have a 10x QAP multiplier).
For any storage network to achieve long-term success, it must ensure reliable data retrieval. In the case of Filecoin, retrievability refers to the ability to reliably access and retrieve files stored by Storage Providers (SPs). Unlike traditional cloud storage systems, where data retrieval is guaranteed by a centralized provider, Filecoin is a decentralized system. This introduces unique challenges related to network performance, data redundancy, SP reliability, and incentives.
This document explores the various strategies and protocols available within Filecoin to enhance data retrievability, along with their guarantees and limitations. It also gives an overview of different payments strategies and ways SPs and Clients can put in place in order to select each other.
Copy URL
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word word

mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1
mmMwWLliI0fiflO&1