How I designed a modern crypto trading system that combines Proximal Policy Optimization (PPO) algorithms with LLM-powered "coaching" using LangGraph