If you're a blockchain developer, there are a lot of questions about nodes and how they work in a blockchain. What are blockchain nodes? Why is it hard to run your own Ethereum node? What is a node provider and why do I need it?
Trust me, it's pretty confusing, and here's an easy introduction to the cool Universe.
What are nodes in a blockchain?
Let's start with the basics! A node is essentially a program running on a single computer that allows you to connect with the rest of the blockchain. It connects to other nodes to send information back and forth, verifies that transactions sent between nodes are valid, and stores important information about the blockchain.
A blockchain is essentially made up of many different nodes. In other words, the physical hardware that runs the blockchain known as Ethereum or Bitcoin is a collection of nodes scattered around the world, run by individuals. Blockchain has no such thing as a master server or a single physical source, which is why it's decentralized.
It's important to note that you can't access the information on the blockchain without using nodes, so you can simply think of nodes as a browser for the blockchain.
A "blockchain" is a collection of individually run computers (nodes) that collectively participate in validating the state of the blockchain according to certain rules.
You can interact between nodes by sending requests to them and receiving responses from them through an API (application programming interface). Suppose you are running a node on port 8545 of your computer;
You can send the following request: Try it online using Alchemy Composer (direct link). This request will ask your node to return the latest block number or the most recently generated block number by calling the block number method. Here is a sample response:
As you can see, the latest block in this example is 0xA1C054, which is converted to 10600532 in decimal form.
Why is it difficult to run a node?
There are a few things that make it difficult to connect to your node for development, and let Cool Universe introduce some of the reasons:
Nodes take a long time to set up, even weeks!
The trouble for any developer is that it takes a lot of time to set up a tool that has little use for things to build, and nodes are one of the worst offenders.
There are generally two main types of nodes: light nodes and full nodes.
The light node only synchronizes the bulk and query requests from the full node, while the full node preserves the entire state of the blockchain, including every transaction ever created. Most queries use light nodes, but full nodes are the backbone of the blockchain and are necessary to provide most of the information.
Light nodes are relatively simple, but you still need to install node programs, set configuration variables, download weights, and check ports and health to make sure they work.
The full node is relatively cumbersome, the biggest problem being that it requires you to download every block from zero to the latest from scratch, and manually replay every block and transaction that anyone has previously submitted. There are over 10 million blocks and billions of transactions for the ethereum main network. This process can take weeks of synchronization.
There is also a type of ethereum called archive node that is useful for historical lookups.
Nodes must be managed by you! Here's a quick recap:
Nodes need to be upgraded every few weeks and occasionally rebuilt from scratch (such as hard forks or client upgrades)
Because most nodes are not designed with reliability in mind, certain queries (such as eth_getLogs) can involve running millions of blocks and transactions, resulting in frequent timeouts or node crashes, which we call "dead queries." So you have to keep a close eye on the health of your nodes, often debugging them at 3am.
Individual nodes may lag behind the network for a variety of reasons (peer connections, lingering on obsolete branches, internal state, etc.). If your nodes lag, it can result in users unknowingly getting outdated data, which can be a very bad and dangerous experience.
Scaling to multiple nodes is tricky
When you're building personal projects, the individual node is generally fine (even if it does crash intermittently). But what happens when you can't make your node servers powerful enough to meet the requests you send?
"I'll just run two nodes and set up a load balancer between them!" You might suggest. That's what we thought! Unfortunately, this setup is actually very difficult to keep consistent, as different nodes "see" the latest state of the blockchain in slightly different ways, leading to inconsistent data and other user issues.
Imagine this: We have two nodes synchronized separately behind a load balancer. Node A considers the latest block to be block 5 and node B considers the latest block to be block 4. This is perfectly normal, because the latest information travels slowly across the network, so some nodes are always ahead of others.
You: hey! Mr. Load balancer, what's the latest block you see?
Mr. LB: (sends A request to node A and returns A response) the latest block on the network is block 5.
You: Thanks! Can you share the information in Block 5 with me?
Mr LB: (sends the request to node B this time): Sorry, I don't know about block 5. Please try again.
In the real world, imagine a user buying an NFT on your application. They may send A request to node A to purchase the NFT, but when their query request is sent to node B, it looks as if the purchase of the NFT never happened! "Consistency problems" like these are common and very difficult to solve, especially when you scale out to dozens of nodes.
What is a node provider?
Node providers are essentially teams that provide a way to access information on the blockchain without having to run their own nodes. Essentially, you can send requests over the Internet to providers that provide the same API. Instead of sending requests to local nodes, the provider runs on the latest nodes that are fully synchronized and available 24/7.
If you remember the previous blockNumber request, this is what the node request looks like when it is sent to the provider:
We simply swapped endpoints without making any other changes.
A reliable node provider will provide at least:
Regularly updated nodes and can alert access to light nodes and full nodes without worrying about forks or network changes;
Archive node to access historical transaction data (only Alchemy can do this for free)
Scalability and reliability: Nodes are always available and can be used at will.
Providers can handle some tricky block issues.
I run my dApp locally and it works fine! But why do you need a node provider?
You don't need a node provider until you're ready to send traffic to the public test network or the main network! A native version of the blockchain for testing (provided by Hardhat or Truffle/Ganache) is all you need to build and test your project.
Once you want to deploy your application into a live chain, the node provider becomes a critical part of the development workflow.
First, you need a way to deploy smart contracts to the blockchain through transactions, which you can only do through nodes on the blockchain. This means running your own node or sending the transaction to the provider.
Second, your application may need to continue extracting information about the blockchain to update its internal state. This information is also passed through the node or node provider. You want the channel to be reliable and properly synchronized so that you don't provide stale or corrupted data to users.
What is Alchemy? How it differs from other node providers
Alchemy is essentially a blockchain node provider with extremely high reliability, excellent customer support, and extensive development tools. It boasts 70% of the top blockchain applications that send traffic through it.
It boasts several factors that set them apart from other node providers in the space:
Developer tools: How do you know what requests your users are sending? If their requests fail, how do you view and debug them? What about the transaction you sent waiting to be mined? The Alchemy Dashboard provides a number of tools that allow you to analyze traffic on your dApp that would otherwise involve a large number of log pages.
Push Notifications: What should you do if you want to be alerted when an Ethereum user you follow (e.g. Vitalik Buterin) makes a transaction? You can write a script to read each block and search for a specific address and run it 24/7, or you can use a tool like Alchemy Notify, which is a tool that sends push notifications (Webhooks) for events on the blockchain.
Enhanced APIS: What if you want to search for all transactions made by a single ETH user? While this might be simple in an SQL database, it's very complicated in a blockchain, where you basically need to scan every transaction on the blockchain (again, there are billions on Ethernet) to see if it includes an address. We've built several enhanced apis that allow you to make this query and others like it in real time.
How do I get started?
Using Alchemy as your node provider is very simple. In fact, it only takes one line of code to get started! If you've been using web3.js or ethers.js, it's easy to create a free Alchemy account, generate API keys and replace the instantiation with something like the following.