That’s strange and document needs correction… you may check the pcap attached for a better understanding,
Packets 14 to 19 are the phase 1 negotiation - packets 18 and 19 will be encrypted
Packets 20,21 and 22 are the Phase 2 negotiations and they are also encrypted
Packets from 23 are the actual esp traffic
Please note that the negotiation on which protocol to use ESP/AH happens during the first message of Phase 2, so we cannot use this before the phase 2 negotiation is complete