Emotion detection is a crucial factor in human-computer interactions, including dialog systems. Speech emotion recognition can be described as predicting the emotional content of speech and classifying speech according to one of several labels (i.e., happy, sad, neutral, and angry). Various learning methods have been applied to increase the performance of emotion classifiers. Since data is sourced at low cost via local devices and users’ privacy is essential, federated learning is a suitable choice as network architecture. In this project, we propose an advanced federated learning framework for emotion detection applications. In addition, since the voice pattern and emotional state of each user are different, the locally generated data will be highly non-IID. To address this problem, we perform personalized federated learning to maintain a specific local model for each user.