CEPH Upgrade

Introduction ¶

We have the following cluster:

Ceph 15.2.1 installed with cephadm
14 OSD
3 MON
3 RGW
3 MDS
3 CRUSH

Version 15.2.4 has been released, and we want to upgrade.

Getting Started ¶

Cephadm provides a mechanism for upgrading, but it didn’t work for me right away, so I had to assist it. Let’s see what exactly needs to be upgraded.

1
ceph orch upgrade check ceph/ceph 15.2.4

A JSON file appears with a list of daemons that have already been upgraded and those that differ from the requested version, i.e., all of them at the moment. Time to get started.
So, the procedure has been launched, and now there are some commands worth running to get a summary.

ceph status - for general information about the cluster’s state. You can run it in a separate terminal like this: watch -n 10 ceph status
ceph status -W cephadm - definitely run this command in the terminal to monitor the orchestrator’s log.

Temporary Difficulties ¶

We call ceph orch ps and see that all monitors and managers are upgraded, good, but now we’re in HEALTH_ERROR with the error shown below.

1
module 'cephadm' has failed: auth get failed: failed to find client.crash.01 in keyring retval: -2

It turns out that some keys are missing. You need to create one for each crash and restart the orchestration.

1
2
3
4
5
6
7
for i in $(ceph orch ps | grep -oE 'crash.[0-9]+'); do
  ceph auth get-or-create "client.$i" mgr "profile crash" mon "profile crash"
done
ceph orch upgrade stop
ceph mgr module disable cephadm
ceph mgr module enable cephadm
ceph orch upgrade start ceph/ceph 15.2.4

Now we see that the crashes are ready too, good, but then everything seems to be stuck. Yes, HEALTH_OK, but ceph orch ps shows that all OSDs are still on the old version. The logs show the following.

1
Upgrade: It is NOT safe to stop osd.0

Aha, the orchestrator is afraid to stop OSD daemons with data. Let’s help. Here is my situation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# ceph osd tree
ID  CLASS  WEIGHT    TYPE NAME       STATUS  REWEIGHT  PRI-AFF
-1         14.00000  root default                             
-3          5.00000      host 01                              
 0    hdd   1.00000          osd.0       up   1.00000  1.00000
 1    hdd   1.00000          osd.1       up   1.00000  1.00000
 2    hdd   1.00000          osd.2       up   1.00000  1.00000
 3    hdd   1.00000          osd.3       up   1.00000  1.00000
 4    hdd   1.00000          osd.4       up   1.00000  1.00000
-5          5.00000      host 02                              
 5    hdd   1.00000          osd.5       up   1.00000  1.00000
 6    hdd   1.00000          osd.6       up   1.00000  1.00000
 7    hdd   1.00000          osd.7       up   1.00000  1.00000
 8    hdd   1.00000          osd.8       up   1.00000  1.00000
 9    hdd   1.00000          osd.9       up   1.00000  1.00000
-7          4.00000      host 03                              
10    hdd   1.00000          osd.10      up   1.00000  1.00000
11    hdd   1.00000          osd.11      up   1.00000  1.00000
12    hdd   1.00000          osd.12      up   1.00000  1.00000
13    hdd   1.00000          osd.13      up   1.00000  1.00000

At first, I stopped OSDs one by one.

1
ceph osd out osd.0

Then I relaxed and began to decommission a whole server sequentially. When we decommission osd.0, after some time, when rebalancing finishes, ceph status -W cephadm complains about osd.1, and ceph orch ps shows that osd.0 is upgraded. The idea is clear; just don’t forget to reintegrate it.

1
ceph osd in osd.0

In short, we decommission the OSDs of the first server, wait until cephadm complains about the next ones, and then reintegrate them. Further, wait for HEALTH_OK before decommissioning the next server’s OSD, or you could lose data…
Based on your OSD tree, the plan, if done from scratch, is as follows:

out 0, 1, 2, 3, 4
wait for complaints about 5
in 0, 1, 2, 3, 4
wait for HEALTH_OK
out 5, 6, 7, 8, 9
wait for complaints about 10
in 5, 6, 7, 8, 9
wait for HEALTH_OK
out 10, 11, 12, 13
check that everything is upgraded with ceph orch ps
in 10, 11, 12, 13
wait for HEALTH_OK

```plain ╔═╗╔═╗╔═╗╦ ╦ ╦ ╦╔═╗╔═╗╦═╗╔═╗╔╦╗╔═╗╔╦╗┬ ║ ║╣ ╠═╝╠═╣ ║ ║╠═╝║ ╦╠╦╝╠═╣ ║║║╣ ║║│ ╚═╝╚═╝╩ ╩ ╩ ╚═╝╩ ╚═╝╩╚═╩ ╩═╩╝╚═╝═╩╝o ```

Notes ¶

The cluster is half-empty, so it was quick. If it were otherwise, it would have taken days…

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
% sudo ceph osd df                                                                                                                                                                                       [22]
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP     META      AVAIL    %USE  VAR   PGS  STATUS
 0    hdd  1.00000   1.00000  5.5 TiB  1.4 GiB  438 MiB   39 KiB  1024 MiB  5.5 TiB  0.03  1.11   37      up
 1    hdd  1.00000   1.00000  5.5 TiB  1.4 GiB  454 MiB  2.2 MiB  1022 MiB  5.5 TiB  0.03  1.12   46      up
 2    hdd  1.00000   1.00000  5.5 TiB  1.3 GiB  314 MiB  1.0 MiB  1023 MiB  5.5 TiB  0.02  1.01   24      up
 3    hdd  1.00000   1.00000  5.5 TiB  1.2 GiB  161 MiB  2.5 MiB  1022 MiB  5.5 TiB  0.02  0.90   34      up
 4    hdd  1.00000   1.00000  5.5 TiB  1.2 GiB  242 MiB   21 KiB  1024 MiB  5.5 TiB  0.02  0.96   34      up
 5    hdd  1.00000   1.00000  5.5 TiB  1.2 GiB  212 MiB  3.0 MiB  1021 MiB  5.5 TiB  0.02  0.93   34      up
 6    hdd  1.00000   1.00000  5.5 TiB  1.5 GiB  485 MiB  533 KiB  1023 MiB  5.5 TiB  0.03  1.14   35      up
 7    hdd  1.00000   1.00000  5.5 TiB  1.3 GiB  280 MiB  1.8 MiB  1022 MiB  5.5 TiB  0.02  0.99   41      up
 8    hdd  1.00000   1.00000  5.5 TiB  1.3 GiB  265 MiB   19 MiB  1005 MiB  5.5 TiB  0.02  0.97   38      up
 9    hdd  1.00000   1.00000  5.5 TiB  1.4 GiB  359 MiB  949 KiB  1023 MiB  5.5 TiB  0.02  1.05   28      up
10    hdd  1.00000   1.00000  5.5 TiB  1.2 GiB  179 MiB  467 KiB  1024 MiB  5.5 TiB  0.02  0.91   33      up
11    hdd  1.00000   1.00000  5.5 TiB  1.1 GiB  132 MiB  2.8 MiB  1021 MiB  5.5 TiB  0.02  0.87   23      up
12    hdd  1.00000   1.00000  5.5 TiB  1.3 GiB  358 MiB   19 MiB  1005 MiB  5.5 TiB  0.02  1.04   36      up
13    hdd  1.00000   1.00000  5.5 TiB  1.3 GiB  301 MiB  734 KiB  1023 MiB  5.5 TiB  0.02  1.00   39      up
                       TOTAL   76 TiB   18 GiB  4.1 GiB   54 MiB    14 GiB   76 TiB  0.02                   
MIN/MAX VAR: 0.87/1.14  STDDEV: 0.00