Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

no match of right hand value {error,enospc} #5265

Open
SourceR85 opened this issue Sep 30, 2024 · 8 comments
Open

no match of right hand value {error,enospc} #5265

SourceR85 opened this issue Sep 30, 2024 · 8 comments

Comments

@SourceR85
Copy link
Contributor

SourceR85 commented Sep 30, 2024

Description

I've set up a fresh CouchDB 3.4.1 instance (as Docker image, build from https://github.com/apache/couchdb-docker/tree/main/3.4.1)
Then I've started a replication from prod.-server and saw endless messages of "no match of right hand value {error,enospc}"

Here a (truncated) copy of the docker log:
couchdb.tar.gz

Your Environment

  • CouchDB version used:
    version: 3.4.1
    "git_sha": "f504e38a5",
    "features": [
        "nouveau",
        "access-ready",
        "partitioned",
        "pluggable-storage-engines",
        "reshard",
        "scheduler"
    ]
  • Operating system and version:
    Fedora Linux 40 (KDE Plasma) and
    Ubuntu Server 24
    (Both running the same docker image and report the same error)

Additional Context

Docker Engine
 Version:    27.3.1
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.16.2-desktop.1
  compose: Docker Compose (Docker Inc.)
    Version:  v2.29.2-desktop.2
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15

I've talked a bit with Jan at slack, his first thoughts:
https://app.slack.com/client/T49P1AZRT/C49LEE7NW

@nickva
Copy link
Contributor

nickva commented Sep 30, 2024

enospc from no match of right hand value {error,enospc} indicates we're probably running out of disk space [1]

It should be a more friendly message in the log, but at least first sight that's what's jumping out.

[1] https://www.man7.org/linux/man-pages/man3/errno.3.html

@SourceR85
Copy link
Contributor Author

SourceR85 commented Sep 30, 2024

enospc from no match of right hand value {error,enospc} indicates we're probably running out of disk space

That's not a problem...
I have 799.7 GB of 2TB free (the DB I replicate is 86.1GB)

@nickva
Copy link
Contributor

nickva commented Sep 30, 2024

Is there any chance view directory is configured to write another disk or the disks may fail to mount and it ends up writting to the root file system. enospc is usually a transparent passthrough error from the FS layer.

The first instance in the logs seem to come from writting an attachments:

gen,do_call,4,[{file,"gen.erl"},{line,237}]},{gen_server,call,3,[{file,"gen_server.erl"},{line,381}]},
{couch_att,write_streamed_attachment,3,

Is there a way to reconfigure the data directory or point it to another volume? Or tests if you can write to it manually? Verify that indeed the data directory is pointing the mounted large volume, sometimes misconfigurations happen and I've seen writes going to another directory than the indentded one.

@SourceR85
Copy link
Contributor Author

SourceR85 commented Sep 30, 2024

As you expect: the docker volume got stuck...
Can't write content into data (just touch file works)

This is my docker deployment (secrets removed)
couchdb.tar.gz
There's nothing fancy in it, as far as I can say...

@nickva
Copy link
Contributor

nickva commented Sep 30, 2024

Can't write content into data (just touch file works)

That would explain it, I think. Good find. It's sneaky that touch works though.

@SourceR85
Copy link
Contributor Author

SourceR85 commented Sep 30, 2024

Just for curiosity, I stopped the container, rm & created couchdb-data and started the replication again:
same result...

[notice] 2024-09-30T16:14:19.553744Z nonode@nohost <0.14636.101> -------- Retrying POST request to http://localhost:5984/hzd/_bulk_docs in 4.0 seconds due to error {code,500}
[error] 2024-09-30T16:14:19.574327Z nonode@nohost <0.16657.101> d5dfe20e02 rexi_server: from: nonode@nohost(<0.19120.101>) mfa: fabric_rpc:update_docs/3 exit:{{badmatch,{error,enospc}},[{couch_bt_engine,write_doc_body,2,[{file,"src/couch_bt_engine.erl"},{line,439}]},{couch_db_updater,'-flush_trees/3-fun-0-',6,[{file,"src/couch_db_updater.erl"},{line,384}]},{couch_key_tree,mapfold_simple,4,[{file,"src/couch_key_tree.erl"},{line,464}]},{couch_key_tree,mapfold_simple,4,[{file,"src/couch_key_tree.erl"},{line,473}]},{couch_key_tree,mapfold,3,[{file,"src/couch_key_tree.erl"},{line,457}]},{couch_db_updater,flush_trees,3,[{file,"src/couch_db_updater.erl"},{line,373}]},{couch_db_updater,update_docs_int,4,[{file,"src/couch_db_updater.erl"},{line,718}]},{couch_db_updater,handle_info,2,[{file,"src/couch_db_updater.erl"},{line,183}]}]} [{couch_db,collect_results,3,[{file,"src/couch_db.erl"},{line,1457}]},{couch_db,collect_results_with_metrics,3,[{file,"src/couch_db.erl"},{line,1439}]},{couch_db,write_and_commit,4,[{file,"src/couch_db.erl"},{line,1471}]},{couch_db,update_docs,4,[{file,"src/couch_db.erl"},{line,1333}]},{fabric_rpc,with_db,3,[{file,"src/fabric_rpc.erl"},{line,360}]},{rexi_server,init_p,3,[{file,"src/rexi_server.erl"},{line,141}]}]
[info] 2024-09-30T16:14:19.574423Z nonode@nohost <0.243.0> -------- db shards/e0000000-ffffffff/hzd.1727710380 died with reason {{badmatch,{error,enospc}},[{couch_bt_engine,write_doc_body,2,[{file,"src/couch_bt_engine.erl"},{line,439}]},{couch_db_updater,'-flush_trees/3-fun-0-',6,[{file,"src/couch_db_updater.erl"},{line,384}]},{couch_key_tree,mapfold_simple,4,[{file,"src/couch_key_tree.erl"},{line,464}]},{couch_key_tree,mapfold_simple,4,[{file,"src/couch_key_tree.erl"},{line,473}]},{couch_key_tree,mapfold,3,[{file,"src/couch_key_tree.erl"},{line,457}]},{couch_db_updater,flush_trees,3,[{file,"src/couch_db_updater.erl"},{line,373}]},{couch_db_updater,update_docs_int,4,[{file,"src/couch_db_updater.erl"},{line,718}]},{couch_db_updater,handle_info,2,[{file,"src/couch_db_updater.erl"},{line,183}]}]}
[error] 2024-09-30T16:14:19.574887Z nonode@nohost <0.18010.101> -------- gen_server <0.18010.101> terminated with reason: no match of right hand value {error,enospc} at couch_bt_engine:write_doc_body/2(line:439) <= couch_db_updater:'-flush_trees/3-fun-0-'/6(line:384) <= couch_key_tree:mapfold_simple/4(line:464) <= couch_key_tree:mapfold_simple/4(line:473) <= couch_key_tree:mapfold/3(line:457) <= couch_db_updater:flush_trees/3(line:373) <= couch_db_updater:update_docs_int/4(line:718) <= couch_db_updater:handle_info/2(line:183)
  last msg: redacted
     state: {db,1,<<"shards/e0000000-ffffffff/hzd.1727710380">>,"./data/shards/e0000000-ffffffff/hzd.1727710380.couch",{couch_bt_engine,{st,"./data/shards/e0000000-ffffffff/hzd.1727710380.couch",<0.19406.101>,#Ref<0.3603940510.502005771.203208>,undefined,{db_header,8,30406,0,{9450247660,{29670,687,{size_info,9279630171,9278136634}},12600491},{9450249167,30357,11927090},{9448039553,[],2388},nil,nil,4251,1000,<<"2719778795232e78e860e5e8ab70c794">>,[{nonode@nohost,0}],0,1000,0},false,{btree,<0.19406.101>,{9450247660,{29670,687,{size_info,9279630171,9278136634}},12600491},fun couch_bt_engine:id_tree_split/1,fun couch_bt_engine:id_tree_join/2,undefined,fun couch_bt_engine:id_tree_reduce/2,snappy},{btree,<0.19406.101>,{9450249167,30357,11927090},fun couch_bt_engine:seq_tree_split/1,fun couch_bt_engine:seq_tree_join/2,undefined,fun couch_bt_engine:seq_tree_reduce/2,snappy},{btree,<0.19406.101>,{9448039553,[],2388},fun couch_bt_engine:local_tree_split/1,fun couch_bt_engine:local_tree_join/2,undefined,nil,snappy},snappy,{btree,<0.19406.101>,nil,fun couch_bt_engine:purge_tree_split/1,fun couch_bt_engine:purge_tree_join/2,undefined,fun couch_bt_engine:purge_tree_reduce/2,snappy},{btree,<0.19406.101>,nil,fun couch_bt_engine:purge_seq_tree_split/1,fun couch_bt_engine:purge_seq_tree_join/2,undefined,fun couch_bt_engine:purge_tree_reduce/2,snappy}}},<0.18010.101>,nil,30406,<<"1727712856444764">>,{user_ctx,null,[],undefined},[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}],[#Fun<couch_doc.7.91987333>],nil,nil,undefined,[{default_security_object,[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},replicated_changes,{user_ctx,{user_ctx,<<"groot">>,[<<"_admin">>],<<"cookie">>}},{w,"1"},{props,[{partitioned,true},{hash,[couch_partition,hash,[]]}]}],undefined}
    extra: []
[notice] 2024-09-30T16:14:19.574938Z nonode@nohost <0.19120.101> d5dfe20e02 localhost:5984 127.0.0.1 groot POST /hzd/_bulk_docs 500 ok 21
[error] 2024-09-30T16:14:19.575102Z nonode@nohost <0.18010.101> -------- gen_server <0.18010.101> terminated with reason: no match of right hand value {error,enospc} at couch_bt_engine:write_doc_body/2(line:439) <= couch_db_updater:'-flush_trees/3-fun-0-'/6(line:384) <= couch_key_tree:mapfold_simple/4(line:464) <= couch_key_tree:mapfold_simple/4(line:473) <= couch_key_tree:mapfold/3(line:457) <= couch_db_updater:flush_trees/3(line:373) <= couch_db_updater:update_docs_int/4(line:718) <= couch_db_updater:handle_info/2(line:183)
  last msg: redacted
     state: {db,1,<<"shards/e0000000-ffffffff/hzd.1727710380">>,"./data/shards/e0000000-ffffffff/hzd.1727710380.couch",{couch_bt_engine,{st,"./data/shards/e0000000-ffffffff/hzd.1727710380.couch",<0.19406.101>,#Ref<0.3603940510.502005771.203208>,undefined,{db_header,8,30406,0,{9450247660,{29670,687,{size_info,9279630171,9278136634}},12600491},{9450249167,30357,11927090},{9448039553,[],2388},nil,nil,4251,1000,<<"2719778795232e78e860e5e8ab70c794">>,[{nonode@nohost,0}],0,1000,0},false,{btree,<0.19406.101>,{9450247660,{29670,687,{size_info,9279630171,9278136634}},12600491},fun couch_bt_engine:id_tree_split/1,fun couch_bt_engine:id_tree_join/2,undefined,fun couch_bt_engine:id_tree_reduce/2,snappy},{btree,<0.19406.101>,{9450249167,30357,11927090},fun couch_bt_engine:seq_tree_split/1,fun couch_bt_engine:seq_tree_join/2,undefined,fun couch_bt_engine:seq_tree_reduce/2,snappy},{btree,<0.19406.101>,{9448039553,[],2388},fun couch_bt_engine:local_tree_split/1,fun couch_bt_engine:local_tree_join/2,undefined,nil,snappy},snappy,{btree,<0.19406.101>,nil,fun couch_bt_engine:purge_tree_split/1,fun couch_bt_engine:purge_tree_join/2,undefined,fun couch_bt_engine:purge_tree_reduce/2,snappy},{btree,<0.19406.101>,nil,fun couch_bt_engine:purge_seq_tree_split/1,fun couch_bt_engine:purge_seq_tree_join/2,undefined,fun couch_bt_engine:purge_tree_reduce/2,snappy}}},<0.18010.101>,nil,30406,<<"1727712856444764">>,{user_ctx,null,[],undefined},[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}],[#Fun<couch_doc.7.91987333>],nil,nil,undefined,[{default_security_object,[{<<"members">>,{[{<<"roles">>,[<<"_admin">>]}]}},{<<"admins">>,{[{<<"roles">>,[<<"_admin">>]}]}}]},replicated_changes,{user_ctx,{user_ctx,<<"groot">>,[<<"_admin">>],<<"cookie">>}},{w,"1"},{props,[{partitioned,true},{hash,[couch_partition,hash,[]]}]}],undefined}
    extra: []
[error] 2024-09-30T16:14:19.575128Z nonode@nohost <0.14636.101> -------- Replicator, request POST to "http://localhost:5984/hzd/_bulk_docs" failed due to error {code,500}
[error] 2024-09-30T16:14:19.575198Z nonode@nohost <0.18010.101> -------- CRASH REPORT Process  (<0.18010.101>) with 0 neighbors crashed with reason: no match of right hand value {error,enospc} at couch_bt_engine:write_doc_body/2(line:439) <= couch_db_updater:'-flush_trees/3-fun-0-'/6(line:384) <= couch_key_tree:mapfold_simple/4(line:464) <= couch_key_tree:mapfold_simple/4(line:473)

grafik

@SourceR85
Copy link
Contributor Author

SourceR85 commented Sep 30, 2024

My fault: I'm using Docker Desktop, the max. storage capacity was globally set to 100GB and the source (CouchDB 3.3.3) is running in parallel, so I can replicate from it...
My assumption was, that I'm running docker without limits.

So nickva spotted it right on his first comment:

enospc from no match of right hand value {error,enospc} indicates we're probably running out of disk space [1]

It should be a more friendly message in the log, but at least first sight that's what's jumping out.

[1] https://www.man7.org/linux/man-pages/man3/errno.3.html

There may be two ideas for improvement, that I can provide from my fault:

  1. A more user friendly error message than {error,enospc}.
  2. Quit CouchDB on that error (since health-checks run fine, as long as the endpoints are reachable) or report an unhealthy status in _up endpoint (507 Insufficient Storage may fit for this purpose).

@nickva
Copy link
Contributor

nickva commented Sep 30, 2024

No worries at all, thanks for reaching out.

Yeah, agree a more friendly error would be nice in the logs.

And it turns out we do have a disk monitor now in 3.4 (the work of @rnewson)!

https://docs.couchdb.org/en/stable/config/disk-monitor.html if you configure it, it will stop indexing when approaching the limit and return a meaningful API error.

See #4681 for the PR comments and the implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants