I have read a bit on Erasure Coding, and many say you cannot or should not build an EC 4_2 pool with Node failure domain if you only have 4 hosts. I am building a ceph cluster to test out some things (specifically a problem I have at work with ctdb) and went with the same config we have there. It is a 4_2 EC with node failure domain that people say to not use.
When I built it on my test lab though, it won’t become available. It just stays at “512 undersized+peered”. I have built it with 7 OSDs per node, 28 total. The EC profile looks like this:
ceph osd erasure-code-profile get EC42
crush-device-class=ssd
crush-failure-domain=host
crush-root=default
directory=/usr/lib/ceph/erasure-code
jerasure-per-chunk-alignment=false
k=4
m=2
packetsize=2048
plugin=jerasure
technique=reed_sol_van
w=8
So, I have two questions.
- What do I have to do to make this actually work?
- Is this OK to do? Can a 4_2 cluster be stable and work well without data loss? We had some issues after setup, but they seemed to revolve around a bad bit of hardware.
Thanks for any advice you offer.