Public-key cryptography is the basis for security and privacy in distributed systems like the Internet, for e-commerce, and for virtually all modern cryptographic protocols. Most public-key cryptosystems involve computation-intensive arithmetic operations (e.g. 1024-bit modular exponentiation), resulting in unacceptably long delays on constrained devices like smart cards. Therefore, current-generation smart cards are equipped with a cryptographic co-processor. However, using special-purpose hardware for public-key cryptography imposes limitations in terms of scalability and algorithm agility. Public-key cryptosystems normally spend most of their execution time in a few performance-critical code segments with well-defined characteristics (e.g. inner loops), making them amenable to processor specialization. The project described in this proposal is directed towards research on instruction-level enhancements to raise the performance of embedded RISC processors when executing cryptographic workloads. We will focus on low-level arithmetic operations used in public-key cryptography, e.g. addition, multiplication, squaring, modular reduction, inversion, and division in multiplicative groups or finite fields of very high order (160-2048 bits). The first goal of this research project is the design, prototype implementation, and test of a SPARC V8-compatible processor with an extended instruction set optimized for public-key cryptography. The second project goal is to develop and analy