Network softwarization has significantly evolved since programmable data planes became topical in academia and industry. Programming Protocol-Independent Packet Processors (P4) is a language to define packet forwarding behavior. Forwarding devices that are programmed with the P4 language support a flexible way to define headers, parse graphs, and data plane logic. However, extending the data plane with additional functionalities has an impact on packet data plane latency. For this reason, this paper analyzes the key factors that affect data pane latency to packets processed by the Tofino-based target (Tofino Native Architecture (TNA)), which can be considered the de facto production-ready and P4-programmable Application-Specific Integrated Circuit (ASIC). Our work first provides an extensive set of latency measurements and, afterwards, it includes a set of data plane latency predictions using the model derived from the latency results and machine learning (ML) algorithms. We demonstrate that the PCA-lasso polynomial (PLP) obtains the best results among the algorithms tested. The best-case results show that PLP obtained an accuracy of 98.22% prediction accuracy when considering the parser, deparser, and the control block for traffic running at 10 G/s (SFP+) and 100 G/s (QSFP28). To the best of our knowledge, this is the first work that provides such a comprehensive profiling, including a method to predict data plane latency in production-grade Tofino ASIC-based switching hardware, which could be leveraged to yield accurate latency values prior to investment and deployment.