A CPU is like a team of 16 expert engineers. Each one can solve complex problems — branching logic, decision trees, operating system tasks. They work fast and handle anything you throw at them, but there are only 16 of them.
A GPU is like a factory floor with 16,000 workers. Each one can only do simple arithmetic, but they all work at the same time. If your job is "multiply these 16,000 pairs of numbers," the GPU finishes in one step while the CPU needs 1,000 steps.
This is the core trade-off: latency vs throughput. CPUs minimize the time for one task. GPUs maximize the number of tasks done simultaneously.