Root partition mounts as read-only after failed cockpit-ceph-deploy

So this all started when I was trying to use the cockpit-ceph-deploy to create a ceph cluster out of one HL15 and 3 Raspberry Pi’s… I know, it’s a weird idea.
I was unable to get the ceph-deploy to complete the device-aliases.yml playbook on the Pis even after manually setting us the device aliases according to this github issue, so I decided to give up for now and try to run the ceph-deploy with only the one HL15 (this would basically be a single-node ceph cluster, and then I could try to add the rasp-pi’s later).
In the middle of all this (or possibly before), I physically moved all my 4TB ZFS hard drives from the 1-1 thru 1-5 slots to the 1-11 thru 1-15 slots, and installed an 8tb drive in slot 1-1.
When I tried the ceph-deploy with only the HL15, I kept getting errors during the core.yml playbook, related to ceph_volumes.py and it showed that the ceph playbook was trying to use the ZFS drives only and not the empty 8TB that I put in specifically for the ceph. I tried manually editing the ceph_volumes.py file to do what I want (bad idea) and I got a different error, the details of which I do not remember. I later discovered that the 8TB drive wasn’t even being recognized by the OS anymore, so I decided to reboot before I went digging around in case I had tried to hot-swap that drive a forgot about it…

After THAT reboot, the cockpit did not start up on its own, so I ssh in and discovered that the root partition was mounted as read-only, which had prevented all the services from running at startup.
I am able to use mount -o remount,rw / to remount it as read-write, but it goes back to read-only after a reboot.
When I checked dmesg I cannot find any obvious errors, the only hint I see is that the boot command line:
[ 0.000000] Command line: BOOT_IMAGE=(hd6,gpt2)/vmlinuz-4.18.0-513.18.1.el8_9.x86_64 root=/dev/mapper/rl-root ro crashkernel=auto resume=/dev/mapper/rl-swap rd.lvm.lv=rl/root rd.lvm.lv=rl/swap rhgb quiet
It says “... root=/dev/mapper/rl-root ro ...” which I am guessing means it is mounting as read-only.

The /home mounts as read-write as it should, but the /boot folder is empty. If a run sudo mount -a then /boot is still empty, but I can use mount /dev/nvme0n1p1 /mnt/boot to mount and access the boot partition jut fine.

If anyone has some advice for how I might figure out the root cause of this, I am not sure if it is just a boot config or caused by some sort of I/O problems. I have heard that some linux will mount as read-only if there are disk I/O errors during boot, but I cannot find any indication of any errors on my system…
I also tried using fsck and xfs_repair and it still boots with the root partition as read-only.

I will try to upload the full output of the dmesg if that would be helpful

HI @NerdyGriffin, I would guess that when you ran the ceph-core playbook it tried to wipe and use your OS boot drive as an OSD drive since the aliasing was not set up correctly.

my guess is now your boot drive is in a broken state because it was wiped/written over with the playbook

Ouch, that sounds painful to fix.
Is there any way for me to restore the original boot drive image that it shipped with? I got it working somehow a week ago, possibly by editing the grub boot command line (I have forgotten what I did since then). I suspect it is in a broken state as you said because I keep encountering new weird errors and issues…

Also, if I reinstall the OS, is there an easy way for me to backup and restore the samba net registry?

I also did not realize that the ceph-deploy playbooks would uninstall the zfs packages, I think that should be stated explicitly in the README.
I spent a couple days trying to reverse-engineer the ceph-deploy source code just to figure out what exactly it was doing when the playbook says “Removing packages…”

We are working on getting a master image ISO hosted for homelabbers to download.

the reason the ceph playbooks remove ZFS is if you are building a ceph cluster there is no need for ZFS to be installed and we wouldn’t want to have ZFS pools and ceph OSD drives in the same host as ceph is all about shared storage and ZFS is only local storage

1 Like

I understand and agree with the reasoning, I just wish it was mentioned in the documentation since it is a potentially breaking change to the system.

Thank you for the update, I will keep an eye for that potential future release