Traditional machine learning approaches for malware detection and classification often rely on handcrafted features and supervised learning, limiting their generalizability and vulnerability to adversarial attacks. This paper presents MalwarePT, a binary-level foundation model specifically designed for robust malware analysis. MalwarePT extracts raw bytes from the code segments of Windows PE malware binaries and employs a BERT-based architecture trained through self-supervised Masked Language Modeling on a large corpus of unlabeled malware samples. This pretraining enables the model to learn intricate byte-level patterns and dependencies inherent in malicious code. Subsequently, MalwarePT is fine-tuned with specialized classification heads for downstream tasks, including functionality classification and malware detection. Evaluations on a diverse dataset of 155,238 unpacked Windows executables demonstrate that MalwarePT outperforms state-of-the-art models like Malconv2 and Ember-based classifiers in functionality identification, achieving higher precision, recall, and F1-scores. In malware detection, MalwarePT matches the performance of leading ML-based detectors while exhibiting superior robustness against various adversarial attacks, including novel code-based evasion techniques and temporal shifts. The model maintains effectiveness even in the presence of packing, a common malware obfuscation method. An ablation study confirms the advantage of MalwarePT’s multi-head attention mechanism over alternative neural network architectures. Security discussions address potential evasion strategies and propose solutions to enhance MalwarePT’s resilience further. Overall, MalwarePT advances the field of malware analysis by providing a more generalizable and adversarially robust foundation model, reducing reliance on handcrafted features, and offering a promising direction for future research in resilient malware detection systems.