![]() Their amazing utility and ecosystem dominance means that in 2021, most technical data practitioners are bilingual in them to some extent. We've established that both SQL and Python have their own superpowers, and are useful for different things. They haven’t yet, however, reached the same ubiquity as Python and Pandas. Javascript, too, is strong among creators of interactive visualizations. Other statistical languages, like R or Julia, have devoted followings and dynamic ecosystems. You can easily add in Geopandas to work with geospatial data, scikit-learn for machine learning, the endlessly customizable matplotlib for charts, or any of the 320,000+ other packages available on PyPI. For almost any use case, you can be sure someone has already solved the problem and published some code that you can easily pull into your work and use on a dataframe. The ecosystem around Pandas is amazing, and essential to its popularity and dominance in Data Science. Pandas, the de facto analytics library for Python, provides high-performance DataFrames, the primary tabular structure for exploring, cleaning, and processing data in Python. As a fully-featured, object-oriented programming language, it can be used to develop complex logic well beyond what’s possible in SQL, whether that’s hitting an API or training a model. Python first emerged 20 years ago, and quickly caught on due to its readability and ease of use. Where SQL is renowned for primordial simplicity and efficiency, Python benefits from infinite flexibility and a rich ecosystem. According to the makers of TimescaleDB, “ balance has been restored to the force”. ![]() SQL has cemented its position as a universal interface for data, and a new generation of data tools is being built around it as a core principle. As far as analytics and data science are concerned, however, these alternatives have been fairly well vanquished. A few years ago it was looking like SQL might be eclipsed by distributed, non-relational NoSQL systems, especially for large-scale workloads. ![]() This makes it approachable for those newer to writing code or working with data.Īnd now, half a century in, SQL is having quite a moment. The learning curve is very short, with a clean syntax that reads similarly to English. One of SQL's main appeals is its ease and readability. Database advancements and dialect-specific functions have brought more power, speed, and functionality, but these are all just helpful additions to the same core language. Yep, that’s SQL! In the 47 years since its initial emergence, very little has really changed about the core syntax. All the way back in 1974, Donald Chamberlin published a paper called SEQUEL: A Structured English Query Language. SQL has been doing data for a very, very long time. To understand why this is such a profound change, let’s first review some history: SQL: the OG Now you don’t have to choose between SQL or Python for a data task: they can be seamlessly combined in one workflow, drawing on the advantages of each language anywhere in an analysis. Today, we’re changing this with the introduction of Dataframe SQL in Hex. For those already confident in both SQL and Python, it's frustrating to have to decide whether to context switch between tools or just muddle through a workflow in Python that they know would be simpler in SQL. It also creates friction for users who want to learn new skills: the jump from writing a quick SQL query to installing and diving into Python is big, and intimidating. This creates a siloing effect, with users of one or the other unable to meaningfully collaborate on an analysis or workflow. Workflows in either language typically live in completely different tools. You would expect analytics practitioners to leverage them both on a task-by-task basis wherever and whenever needed, lapsing into SQL for fast filtering halfway through a Python analysis. While it’s true that SQL is super approachable and Python has more flexibility, like any language they each have real strengths and weaknesses that manifest all throughout the data analysis process. Despite being debunked multiple times over the years (an excellent and recent example being Pedram Navid's "For SQL"), the myth persists. There’s also a perception that Python is for “real Data Scientists”, while SQL is for the less-technical masses. They are sometimes presented as competing alternatives, and evangelists for one or the other are happy to proselytize the advantages of their preferred language. Most popular technologies from Stack Overflow As of 2021, they are the 3rd and 4th most used programming languages, in large part because of their popularity among Data Scientists and Analysts. In the past few years, SQL and Python have emerged as the linguae francae of data.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |