That's more of a storage thing, RAM does a lot smaller transfers - for example a DDR5 memory has two independent 32bit (4 byte) channels with a minimum of 16 transfers in a single "operation", so it does 64 bytes at once (or more). And CPUs don't waste memory bandwidth than transferring more than absolutely necessary, as memory is often the bottleneck even without writing full pages.
The page size is relevant for memory protection (where the CPU will stop the program execution and give control back to the operating system if said program tries to do something it's not allowed to do with the memory) and virtual memory (which is part of the same thing, but they are two theoretically independent concepts). The operating system needs to make a table describing what memory the program has what kind of access to, and with bigger pages the table can be much smaller (at the cost of wasting space if the program needs only a little bit of memory of a given kind).
That's a reasonable per-core size, and it doesn't make much sense to add all the cores up if your goal is to fit your data within L2 (like in the article)