This paper looks at the coordination of vocal and bodily behavior in the multilayered activity of dance teaching, where teachers simultaneously explain and perform. The aim is to show how talk is adjusted to the rhythm and character of the dance on the one hand, and how dance is fitted into the evolving grammar on the other. The study focuses on the emergence of specialized grammar that is capable of incorporating embodied demonstrations. The temporalities of talk and dance are mutually adjusted and intertwined in the teachersí actions, resulting in inherently multimodal patterns of sense-making that are applied for various instructive and other social tasks. Calling into question the analytic boundary between grammar and the body, the paper argues that projection cross-cuts modalities.