Problem
Sometimes a Storage Node Operator may encounter the "database disk image is malformed" error in their log. This could happen during unplanned shutdown or reboot. The error indicates that one or more of the sqlite3
databases may have become corrupted.
Solution
- Firstly, we should try to verify the database with an embedded SQLite3 command. So, we need to have
sqlite3
installed (v3.25.2 or later). The installation steps depend on the OS. - We will use Docker instead of direct installation (this option is available only for x86_64 CPUs, for arm-based boards you will need to install
sqlite3
via the package manager of your OS). - Stop the storagenode
- Make a backup of all the
sqlite3
databases. They are located in thestorage
folder for your data storage. For examplex:\storagenode\storage\bandwidth.db
, wherex:\storagenode
is the data folder you had specified in the--mount type=bind,source=x:\storagenode,destination=/app/config
option of thedocker run
command for your storagenode, orx:\storagenode\storage
in case of using the Windows GUI, in thestorage.path:
option of theconfig.yaml
file. - Check each database for errors. We will use
bandwidth.db
as an example in this guide.- Docker (replace
${PWD}
with an absolute path to the databases location, or simple switch the current location to there)docker run --rm -it --mount type=bind,source=${PWD},destination=/data sstc/sqlite3 find . -maxdepth 1 -iname "*.db" -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'
- Direct installation
-
Linux
sudo apt update && sudo apt install sqlite3 -y
- Windows
https://www.sqlitetutorial.net/download-install-sqlite/ - Make sure that the version is v3.25.2 or later, otherwise the check will not work correctly.
sqlite3 --version
- perform the integrity check (perform for each database), for example for
bandwidth.db
:
sqlite3 /path/to/storage/bandwidth.db "PRAGMA integrity_check;"
- Or check all databases with help of shell commands:
- bash (Linux)
find /path/to/storage/ -maxdepth 1 -iname "*.db" -print0 -exec sqlite3 '{}' 'PRAGMA integrity_check;' ';'
- PowerShell (Windows)
Get-ChildItem X:\storagenode\storage\*.db -File | %{$_.Name + " " + $(sqlite3.exe $_.FullName "PRAGMA integrity_check;")}
- bash (Linux)
-
- Docker (replace
- If you see errors in the output, then the check did not pass. We will unload all uncorrupted data and then load it back. But this could sometimes fail, too. If no errors occur here, you can skip all the following steps and start the storagenode again.
-
If you were not lucky and the check failed, then please try to fix the corrupted database(s) as shown below.
-
Open a shell
-
Docker version of sqlite3. Open a shell Inside the container:
docker run --rm -it --mount type=bind,source=x:\storagenode\storage,destination=/storage sstc/sqlite3 sh
Tip. You can use
tmpfs
to restore your databases. It uses memory instead of disk and should take a lot less time than on HDD (you can read more about usage oftmpfs
with Docker in the Use tmpfs mounts guide or this forum comment). For Windows or MacOS you must increase the allocated RAM for the docker's VM via Docker desktop application to fit a double size of the greatest corrupted database file in case of usage oftmpfs
. - You could use your shell directly if you have sqlite3 installed. In that case, use the path to your storage instead of "/storage/" across this guide below.
- For Windows with local sqlite3 installed, we recommend to use a PowerShell to execute the commands below. Don't forget to replace the "/storage/" folder with your local path to the folder where the databases are stored. If the
sqlite3.exe
executable is not in the system variable PATH, then you should specify the full path to it or run from the location of the executable.
-
- Now run the following commands in the shell. You need to repeat steps 9 to 14 for each corrupted
sqlite3
database:cp /storage/bandwidth.db /storage/bandwidth.db.bak
sqlite3 /storage/bandwidth.db -
You will see a prompt from
sqlite3
. Run this SQL script:.mode insert
.output /storage/dump_all.sql
.dump
.exit -
We will edit the SQL file
dump_all.sql
-
Linux or docker version:
{ echo "PRAGMA synchronous = OFF ;"; cat /storage/dump_all.sql; } | grep -v -e TRANSACTION -e ROLLBACK -e COMMIT >/storage/dump_all_notrans.sql
- PowerShell (Windows) with a local sqlite3 version:
$(echo "PRAGMA synchronous = OFF ;"; Get-Content dump_all.sql) | Select-String -NotMatch "TRANSACTION|ROLLBACK|COMMIT" | Set-Content -Encoding utf8 dump_all_notrans.sql
-
-
Remove the corrupted database (make sure that you have a backup!)
rm /storage/bandwidth.db
-
Now we will load the unloaded data into the new database
sqlite3 /storage/bandwidth.db ".read /storage/dump_all_notrans.sql"
-
Check that the new database (
bandwidth.db
in our example) has a size larger than 0:-
Linux or docker version
ls -l /storage/bandwidth.db
- PowerShell (Windows) with a local sqlite3 version:
ls /storage/bandwidth.db
-
-
Exit from the container (skip this step, if you use a directly installed sqlite3)
exit
-
If you are lucky and all corrupted
sqlite3
databases are fixed, then you can start the storagenode again.
Warning. If you were not successful with the fix of the database, then your stat is lost.
You need to follow the guide How to fix database: file is not a database error.
Prevention
On Windows: disable the write cache. Consider migrating to the Windows GUI instead of using Docker.
On Unraid: update to the latest version of the platform (the bug is fixed in the 6.8.0-rc5 as seen in this comment) or rollback to version 6.6.7.
On Docker: use the updated docker run
command from the documentation: https://docs.storj.io/node/get-started/install-node-software/cli/storage-node#step-3-run-the-storage-node
Common Problems
Make sure that you are not using NFS or SMB to connect to the storage, they are not compatible with SQLite. The only working network protocol is iSCSI.
Make sure that your external USB drive has enough power and it does not turn off during operations. It's better to avoid using them and use only internal drives.
Comments
0 comments
Please sign in to leave a comment.