Python Bytes

by Michael Kennedy and Brian Okken

Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.

  

Latest Episodes

#238 A cloud-based file system for Python and a new GUI!

Watch the live stream:

Watch on YouTube

About the show

Sponsored by Sentry:

  • Sign up at pythonbytes.fm/sentry
  • And please, when signing up, click Got a promo code? Redeem and enter PYTHONBYTES

Special guest: Julia Signell

Brain #1: Practical SQL for Data Analysis

  • Haki Benita
  • Pandas is awesome, but … “In this article I demonstrate how to use SQL to perform fast and efficient data analysis.”
  • First part of the article.
    • SQL is faster than Pandas
    • But they are great together
  • Then tons of examples showing exactly how to best use SQL queries and Pandas in data analysis::
    • Basics including random data and sampling
    • Descriptive statistics
    • Subtotals including rollup and groupign sets
    • Pivot tables, both conditional expressions and aggregate expressions
    • Running and cumulative agregation
    • Linear Regression
    • Interpolation
  • Super cheat sheet for useful SQL queries

Michael #2: Git Blame in your Python Tracebacks

  • via Ruslan Portnoy, by Ofer Koren
  • Helpful Modules: traceback & linecache
  • traceback uses linecache, and we can change linecache line’s text
  • They create a git blame bit of functionality to add to line’s source
  • Turns out this flows to things like PDB.
  • Ripe for a proper package we can add to requirements-dev.txt

Julia #3: fsspec: a unified file system library

  • Martin Durant
  • Other libraries conform to the interface so that each part of the analysis pipeline is like an interchangeable building block (for example s3fs, gcsfs)
  • With the cloud providers competing to host data, fsspec makes it easy to swap out the read layer so that you can hop clouds.

Brian #4: The need for slimmer containers or I’m even more confused now as to the usefulness of official base images on Docker Hub

  • Ivan Velichko @iximiuz
  • I read this article recently and it had me concerned. Then just yesterday read it again and there are some updates. I’m still concerned, but now also confused. So let’s run it down.
  • docker scan can be run on official Python images.
  • Spoiler, all of the official Python containers have vulnerabilities except alpine.
    • But. In an update, the author says that Alpine has a bunch of problems.
  • The update includes some discussion on Hacker News
    • vulnerability scanners tend to have lots of false positives
    • official base images are rarely updated
    • some people suggest adding an upgrade command in the beginning of every Dockerfile.
    • but others object saying that the practice leads to unrepeatable builds
  • So, I’m left with wondering if using official Python images are even worth it.
  • Michael: Python’s official image on docker hub
  • Michael: PEP 656 -- Platform Tag for Linux Distributions Using Musl
  • Michael: We dive a lot into this in our latest Talk Python recording (not out yet, but live stream is available)
  • Some stats:
  • Ubuntu: Found 32 vulnerabilities, 31 with upgrade.
  • python:latest: Found 364 vulnerabilities, 353 with upgrade
  • Ubuntu with source Python: 35 total, 28 low, 7 medium, several from intermediate tools such as wget, gcc, etc.
  • Removing many dev tools SHOULD lower the count, but doesn’t (e.g. wget, gcc)
  • Switching from python:3-9 to python:3.9-slim-buster dropped the issues to 69.

Michael #5: PandasGUI: A GUI for analyzing Pandas DataFrames

  • Features
  • View DataFrames and Series (with MultiIndex support)
  • Interactive plotting
  • Filtering
  • Statistics summary
  • Data editing and copy / paste
  • Import CSV files with drag & drop
  • Search toolbar
  • Best way to see what it’s about is to watch the video.

Julia #6: xarray: pandas-like API for labeled N-dimensional data

  • We’ve been talking a lot about the pandas API and how it’s a common target for dataframe libraries.
  • Xarray is not a dataframe library, it’s for labeled N-dimensional data.
  • People use it in geosciences, and in image processing where they don’t have tabular data, but the axes mean something (lat, lon, time, band…)
  • You can select, aggregate, resample, using the real dimension labels.
  • It can be backed with dask arrays or numpy arrays (or other types of arrays).
  • It supports plotting with .plot

Extras

Michael

Brian

  • Someone responded to me the other day on twitter with an emoji that I was not clear on the meaning of. So I looked it up on emojipedia.org. Super useful for occasionally out of touch people like myself.
  • pytestbook.com (redirects to pythontest.com/pytest-book/) has a facelift and a new home, to get ready for an announcement later this week. It’s built on markdown, hugo, github, and Netlify, so changes can be done super quick with just a commit and push. I just needed a nice readable theme, and Pradyun’s blog looked great, so I copied his choices.
  • The blog will eventually also have writing, the legacy posts worth keeping from pythontesting.net, and probably transcripts from Test & Code.

Julia

  • GH CLI
  • entrypoints - they are so cool! Example - with pandas you can plot with different backends not just matplotlib and the logic for those backends is contained in the plotting libraries not pandas.

Joke

From https://upjoke.com/programmer-jokes

  • I asked a programmer what her New Year's resolution will be.
  • She answered: 1920x1080.

  • How does a programmer confuse a mathematician?

  • x = x + 1

  • Why do Python programmers have low self esteem?

  • They're constantly comparing their self to other.


Audio Download

Posted on 15 June 2021 | 8:00 am


#237 Separate your SQL and Python, asynchronously with aiosql

Watch the live stream:

Watch on YouTube

About the show

Sponsored by Sentry:

  • Sign up at pythonbytes.fm/sentry
  • And please, when signing up, click Got a promo code? Redeem and enter PYTHONBYTES

Special guest: Mike Groves

Michael #1: Textual

  • Textual (Rich.tui) is a TUI (Text User Interface) framework for Python using Rich as a renderer.
  • Rich TUI will integrate tightly with its parent project, Rich.
  • This project is currently a work in progress and may not be usable for a while.

Brian #2: Pinning application dependencies with pip-tools compile

  • via John Hagen
  • pip-tools has more functionality than this, but compile alone is quite useful
  • Start with a loose list of dependencies in requirements.in:
    rich
  • Can have things like >= and such if you have fixed dependencies.
  • Now pip install pip-tools, and pip-compile requirements.in
  • or python -m piptools compile requirements.in
    • both have same effect.
  • Now you’ll have a requirements.txt file with pinned dependencies:
    # autogenerated by: pip-compile requirements.in
    click==7.1.2
        # via typer
    colorama==0.4.4
        # via rich
    commonmark==0.9.1
        # via rich
    pygments==2.9.0
        # via rich
    rich==10.2.2
        # via -r requirements.in
    typer==0.3.2
        # via -r requirements.in
  • Now, do the same with a dev-requirements.ini to create dev-requirements.txt.
  • Then, of course:
    - `pip install -r requirements.txt`
    - `pip install -r dev-requirements.txt`
    - And test your application.
    - All good? Push changes.
  • To force pip-compile to update all packages in an existing requirements.txt, run pip-compile --upgrade.
  • John provided an example project that uses this workflow: python-blueprint

Mike #3: Pynguin

  • Automated test generation
  • Pynguin is a framework that allows automated unit test generation for Python. It is an extensible tool that allows the implementation of various test-generation approaches.

Michael #4: Python Advisory DB

  • via Brian Skinn
  • A community owned repository of advisories for packages published on pypi.org.
  • Much of the existing set of vulnerabilities are collected from the National Vulnerability Database CVE feed.
  • Vulnerabilities are integrated into the Open Source Vulnerabilities project, which provides an API to query for vulnerabilities.
  • Longer term, we are working with the PyPI team to build a pipeline to automatically get these vulnerabilities [listed] into PyPI.
  • Tracks known security issues with the packages, for example:
    PYSEC-2020-28.yaml
    id: PYSEC-2020-28
    package:
      name: bleach
      ecosystem: PyPI
    details: In Mozilla Bleach before 3.12, a mutation XSS in bleach.clean when RCDATA
      and either svg or math tags are whitelisted and the keyword argument strip=False.
    affects:
      ranges:
      - type: ECOSYSTEM
        fixed: 3.1.2
      versions:
      - '0.1'
      - 0.1.1
      - 0.1.2
      - '0.2'
    ...

Brian #5: Function Overloading with singledispatch and multipledispatch

  • by Martin Heinz
  • I kinda avoid using the phrase “The Correct Way to …”, but you do you, Martin.
  • In C/C++, we can overload functions, which means multiple functions with the same name but different parameter types just work.
  • In Python, you can’t do that automatically, but you can do it.
  • It’s in the stdlib with functools and singledispatch:
    from functools import singledispatch
    from datetime import date, time

    @singledispatch
    def format(arg):
        return arg

    @format.register
    def _(arg: date):
        return f"{arg.day}-{arg.month}-{arg.year}"

    @format.register(time)
    def _(arg):
        return f"{arg.hour}:{arg.minute}:{arg.second}"
  • Now format works like two functions:
    print(format(date(2021, 5, 26)))
    # 26-5-2021
    print(format(time(19, 22, 15)))
    # 19:22:15
  • What if you want to switch on the type of multiple parameters? multipledispatch, a third party package, does the trick:
    from multipledispatch import dispatch

    @dispatch(list, str)
    def concatenate(a, b):
        a.append(b)
        return a

    @dispatch(str, str)
    def concatenate(a, b):
        return a + b

    print(concatenate(["a", "b"], "c"))
    # ['a', 'b', 'c']
    print(concatenate("Hello", "World"))
    # HelloWorld

Mike #6: Aiosql

  • Fast Async SQL Template Engine
  • Lightweight replacement for ORM libraries such as SQLAlchemy.

Extras

Michael

  • SoftwareX Journal, Elsevier has had an open-access software journal, via Daniel Mulkey. There's even a special issue collection on software contributing to gravitational wave discovery.
  • Python 3.10.0b2 is available
  • Django security releases issued: 3.2.4, 3.1.12, and 2.2.24
  • Talks on YouTube for PyCon 2021.
  • aicsimageio 4.0 released, lots of goodness for bio-image analysis and microscopy, thanks Madison Swain-Bowden.

Mike

Joke

Bank robbers


Audio Download

Posted on 9 June 2021 | 8:00 am


#236 Fuzzy wuzzy wazzy fuzzy was faster

Watch the live stream:

Watch on YouTube

About the show

Sponsored by Sentry:

  • Sign up at pythonbytes.fm/sentry
  • And please, when signing up, click Got a promo code? Redeem and enter PYTHONBYTES

Special guest: Anastasiia Tymoshchuk

Brian #1: Using accessible colors, monolens & CMasher

  • Tweet by Matthew Feickert, @HEPfeickert
    • “I need to give some serious praise to fellow Scikit-HEP dev Hans Dembinski on his excellent monolens tool for interactive simulation of kinds of color blindness. It works really quite well and the fact that is a pipx install away is awesome!
  • monolens lets you “view part of your screen in greyscale or simulated colorblindness”
    • So simple. Just pops up a box that you can drag around your monitor and view stuff in greyscale.
  • Reply tweet by Niko, @NikoSercevic
    • “I mean to use cmasher so I know it’s cb friendly”
  • CMasher : “Scientific colormaps for making accessible, informative and cmashing plots”
    • Provides a collection of scientific colormaps and utility functions to be used by different Python packages and projects, mainly in combination with matplotlib.
    • Lots of great colormaps that are color blindness friendly.
    • Just specify the CB friendly colormaps with plots, super easy.
    # Import CMasher to register colormaps
    import cmasher as cmr

    # Import packages for plotting
    import matplotlib.pyplot as plt
    import numpy as np

    # Access rainforest colormap through CMasher or MPL
    cmap = cmr.rainforest                   # CMasher
    cmap = plt.get_cmap('cmr.rainforest')   # MPL

    # Generate some data to plot
    x = np.random.rand(100)
    y = np.random.rand(100)
    z = x**2+y**2

    # Make scatter plot of data with colormap
    plt.scatter(x, y, c=z, cmap=cmap, s=300)
    plt.show()

Michael #2: rapidfuzz: Rapid fuzzy string matching in Python and C++

  • via Mikael Honkala
  • Rapid fuzzy string matching in Python and C++ using the Levenshtein Distance
  • “you mention fuzzywuzzy for fuzzy text matching in the last episode, and wanted to mention the rapidfuzz package as a high-performance alternative.”
  • “non-rigorous performance testing of several alternatives (including fuzzywuzzy), and rapidfuzz came out on top with a sizable margin.”
  • Simple Ratio example:
    > fuzz.ratio("this is a test", "this is a test!")
    96.55171966552734

Anastasiia #3: Structlog to improve your logs

  • One of the best ways to improve logs is to add more structure to them
  • Why do we even need to care about logs?
    • logs can provide visibility to production, what is actually happening
    • logs can help to improve tracing of a bug, especially if logs are machine-readable and easy parseable
    • logs can give you a clue why a bug or an exception occurred
  • It’s super easy to start with Structlog, also easy to integrate it with ELK stack for further processing
  • Features that you will get if switch your logs to use structlog:
    • readable structure of logs in key-value pairs
    • easy to parse with any post processor to visualise logs and to have more visibility for your code
    • you can create custom log levels and separate specific logs with event keys for each log
  • I am working with structured logs for a couple of years and recommend everyone to try

Brian #4: xfail now works with pytest-subtests

Michael #5: BaseSettings in Pydantic

  • via Denis Roy
  • Create a model that inherits from BaseSettings
  • The model initialiser will attempt to determine the values of any fields not passed as keyword arguments by reading from the environment.
  • This makes it easy to:
    • Create a clearly-defined, type-hinted application configuration class
    • Automatically read modifications to the configuration from environment variables
    • Manually override specific settings in the initialiser where desired (e.g. in unit tests)
  • Get values from OS ENV or .env files
  • Also has support for secrets files

Anastasiia #6: Take care of the documentation on your team will thank you later

  • Sphinx and ReadTheDocs will make life of developers so much easier
  • Everyone knows importance of documentation, but how to keep it up to date?
  • In my experience, I tried to use Confluence, describe new features in detailed Jira tickets, write some hints in Google docs and sharing them with the team. It does not work, as documentation is getting outdated and piles up drastically
  • Benefits of implementing continuous documentation for the code:
    • easy to support by writing docstrings, updating them when needed
    • easy to find needed information in a centralised documentation
    • easy to keep versioning for each new release of the code
    • ReadTheDocs if free for open source code
    • Sphinx will generate code reference documentation for the code

Extras

Michael

Brian

  • pytest uses. Please comment on this thread if you know of some great projects that use pytest, if they converted from something else, or just find it interesting that they use pytest.

Joke

First time recursion


Audio Download

Posted on 2 June 2021 | 8:00 am


#235 Flask 2.0 Articles and Reactions

Watch the live stream:

Watch on YouTube

About the show

Sponsored by Sentry:

  • Sign up at pythonbytes.fm/sentry
  • And please, when signing up, click Got a promo code? Redeem and enter PYTHONBYTES

Special guest: Vincent D. Warmerdam koaning.io, Research Advocate @ Rasa and maintainer of a whole bunch of projects.

Intro: Hello and Welcome to Python Bytes Where we deliver Python news and headlines directly to your earbuds. This is episode 235, recorded May 26 2021 I’m Brian Okken [HTML_REMOVED] [HTML_REMOVED]

Brian #1: Flask 2.0 articles and reactions

Michael #2: Python 3.11 will be 2x faster?

  • via Mike Driscoll
  • From the Python Language summit
  • Guido asks "Can we make CPython faster?”
  • We covered the Shannon Plan for speedups.
  • Small team funded by Microsoft: Eric Snow, Mark Shannon, myself (might grow)
  • Constrains: Mostly don’t break things.
  • How to reach 2x speedup in 3.11
    • Adaptive, specializing bytecode interpreter
    • “Zero overhead” exception handling
    • Faster integer internals
    • Put __dict__ at a fixed offset (-1?)
  • There’s machine code generation in our future
  • Who will benefit
    • Users running CPU-intensive pure Python code •Users of websites built in Python
    • Users of tools that happen to use Python

Vincent #3:

  • DEON, a project with meaningful checklists for data science projects!
    • It’s a command line app that can generate checklists.
    • You customize checklists
    • There’s a set of examples (one for for each check) that explain why the checks it is matter.
    • Make a little course on calmcode to cover it.

Brian #4: 3 Tools to Track and Visualize the Execution of your Python Code

  • Khuyen Tran
  • Loguru — print better exceptions
    • we covered in episode 111, Jan 2019, but still super cool
  • snoop — print the lines of code being executed in a function
    • covered in episode 141, July 2019, also still super cool
  • heartrate — visualize the execution of a Python program in real-time
    • this is new to us, woohoo
  • Nice to have one persons take on a group of useful tools
    • Plus great images of them in action.

Michael #5: DuckDB + Pandas

  • via __AlexMonahan__
  • What’s DuckDB? An in-process SQL OLAP database management system
  • SQL on Pandas: After your data has been converted into a Pandas DataFrame often additional data wrangling and analysis still need to be performed. Using DuckDB, it is possible to run SQL efficiently right on top of Pandas DataFrames.
  • Example
    import pandas as pd
    import duckdb

    mydf = pd.DataFrame({'a' : [1, 2, 3]})
    print(duckdb.query("SELECT SUM(a) FROM mydf").to_df())
  • When you run a query in SQL, DuckDB will look for Python variables whose name matches the table names in your query and automatically start reading your Pandas DataFrames.
  • For many queries, you can use DuckDB to process data faster than Pandas, and with a much lower total memory usage, without ever leaving the Pandas DataFrame binary format (“Pandas-in, Pandas-out”).
  • The automatic query optimizer in DuckDB does lots of the hard, expert work you’d need in Pandas.

Vincent #6:

  • I work for a company called Rasa. We make a python library to make virtual assistants and there’s a few community projects. There’s a bunch of cool showcases, but one stood out when I was checking our community showcase last week. There’s a project that warns folks about forest fire updates over text. The project is open-sourced on GitHub and can be found here. There’s also a GIF demo here.
    • Amit Tallapragada and Arvind Sankar observed that in the early days of the fires, news outlets and local governments provided a confusing mix of updates about fire containment and evacuation zones, leading some residents to evacuate unnecessarily. They teamed up to build a chatbot that would return accurate information about conditions in individual cities, including nearby fires, air quality, and weather data.
    • What’s cool here isn’t just that Vincent is biased (again, he works for Rasa), it’s also a nice example of grass-roots impact. You can make a lot of impact if there’s open APIs around.
    • They host a scraper that scrapes fire/weather info every 10 seconds. It also fetches evacuation information.
    • You can text a number and it will send you up-to-date info based on your city. It will also notify you if there’s an evacuation order/plan.
    • They even do some fuzzy matching to make sure that your city is matched even when you make a typo.

Extras

Michael

Vincent: Human-Learn: a suite of tools to have humans define models before resorting to machines.

  • It’s scikit-learn compatible.
  • One of the main features is that you’re able to draw a model!
  • There’s a small guide that shows how to outperform a deep learning implementation by doing exploratory data analysis. It turns out, you can outperform Keras sometimes.
  • There’s a suite of tools to turn python functions into scikit-learn compatible models. Keyword arguments become grid-search-able.
  • Tutorial on calmcode.io to anybody interested.
  • Can also be used for Bulk Labelling.

Joke


Audio Download

Posted on 26 May 2021 | 8:00 am


#234 The Astronomy-filled edition with Dr. Becky

Watch the live stream:

Watch on YouTube

About the show

Sponsored by Sentry:

  • Sign up at pythonbytes.fm/sentry
  • And please, when signing up, click Got a promo code? Redeem and enter PYTHONBYTES

Special guest: Dr. Becky Smethurst

Brian #1: Powering the Python Package Index in 2021

  • Dustin Ingram
  • A lot has changed in 5 years since the previous write-up
  • From 3 people to
    • 3 maintainers/admins
    • 5 moderators
    • 3 commiters
  • Companies donate about $1.8M per month in services
    • Fastly, mostly
    • Google Cloud ~ $10K
    • AWS ~ $7K
    • Also Statuspage, Sentry, Datadog, Digicert, Pingdom
  • Awesome grants to fund projects
    • rewrite of PyPI
    • Localization, internationalization, API tokens and 2FA
    • Malware Detection and Update Framework
    • Foundational Tool Improvements & Productionized Malware Detection
    • Support Staff (a project manager)
  • Growth, now up to (per day)
    • 1.7 B requests pypi
    • 55.4 TB pypi
  • Next steps
    • FUNDABLES.md, which is a non-exhaustive wishlist of large projects we’d like to see happen
    • become a member, donate, or volunteer

Michael #2: The Leuven Star Atlas

  • via Shahrin Ahmad
  • Making a publication-quality stellar atlas from scratch
  • Plotting one page of the atlas: There is one single python script that takes care of the plotting of a single page of the atlas (plot_map.py). At the moment it is 1545 lines long
  • The goal was to produce a publication quality, both practical and visually pleasing star atlas aimed at amateur astronomers.
  • Took about 1.5 months to build/develop
  • Libraries used:
  • numpy for all kinds of data handling and numerical operations
  • pylab / matplotlib for all the main plotting operations
  • basemap for the mapping (takes care of the projection and the related transformations)
  • scipy for some specific interpolations and contours connected to the Milky Way
  • astropy and pyephem for celestial coordinate transformations
  • Source data: All databases that I am using are either publicly available from the internet (under various licenses), or they are compiled by me from publicly available data (links in the article)
  • One of the main new features of my atlas (compared to other atlases on the market) is the inclusion of the (as) precise (as possible) contours of the Milky Way on its pages.
  • Interesting library: adjustText - automatic label placement for matplotlib
  • The whole process takes around 4 hours on my laptop (using 4 cores in parallel).
  • Whole thing reminds me of the quote: “Data cleanin√g isn’t grunt work, it is THE work.”

Becky #3: TI-84 Plus CE Python graphing calculator

  • I remember being so attached to my graphica calculator at school and I swear I haven’t used it since I was 18 - they were banned from my university exams
  • Remember very pixelated screen, almost like an original GameBoy, and plotting was the worst - but what if could have colour plots in Python
  • Teaching kids to code early is so important, but learning to code with no purpose is also incredibly difficult. Learn alongside everything else makes it second nature and when something is second nature it becomes a tool you can use to solve a whole host of problems

Brian #4: Python Package CI/CD with GitHub Actions

  • Johanan Idicula
  • Nice write up of working with GH Actions
  • Triggers from push or pull request
  • Matrix runs
    • Running jobs across different build environenments
    • ubuntu
    • macos
    • windows
    • Diff python versions
  • Caching some tools to not have to load them for each combination
    • example caches Poetry
  • Running tests, of course
  • Checking artifacts
  • Auto-merge some branches
  • Release automation to pypi on ‘v*’ tag pushes

Michael #5: SpaceX is using Python for prototyping their Starlink satellite software

  • via Garett Dunn
  • From four-part series on the software that powers SpaceX
  • The software breaks down roughly into two parts: 1) software that flies and 2) software that supports the flying components.
  • For Starlink, one of the main challenges is that our “towers” are orbiting Earth, forcing your path to the internet to change very frequently.
  • The Earth-side network then provides continuous updates on traffic conditions and constellation changes, while each satellite updates the ground on its planned trajectory.
  • Starlink software, both in satellites and on the ground, is written almost exclusively in C++
  • But the prototyping is done in … Python.
  • The software is developed in a continuous integration environment, with teams merging into the master development branch often and deploying to the fleet of satellites in space each week.
  • Live view findstarlink.com and starlink.sx and starlinkradar.com/livemap.html
  • The Python version allows for rapid iteration during the design phase. Once we are happy with the results of an algorithm, we port it to C++ so it runs efficiently in production.

Becky #6:: A beginner’s guide to working with astronomical data

  • it’s a scientific paper but huge sections on using Python to analyse images, remove noise, all the steps needed not just for me as professional but one I hope amateurs will find useful too
  • Huge shoutout to astropy, Michael mentioned it before, revolutionised the field but also those keen amateur astrophotographers who perhaps use a Raspberry Pi to drive their telescope or to analyse their images

Extras

Michael

Becky

Joke


Audio Download

Posted on 19 May 2021 | 8:00 am


#233 RaaS: Readme as a Service

Watch the live stream:

Watch on YouTube

About the show

Sponsored by us! Support our work through:

Special guest: Marlene Mhangami

Brian #1: readme.so

  • Recommended by Johnny Metz
  • This is not only useful, it’s fun
  • Interactively create a README.md file
  • Suggested sections great
  • There are lots of sections though, so really only pick the ones you are willing to fill in.
  • I think this is nicer than the old stand by of “copying the README.md of another project” because that other project might not have some of these great sections, like:
    • Acknowledgements
    • API Reference
    • Authors
    • FAQ
    • Features
    • Logo
    • Roadmap
    • Usage/Examples
    • Running Tests
  • Note, these sections are listed in alphabetical order, not necessarily the right order for how they should go in your README.md
  • Produces a markdown file you can copy or download
  • Also an editor so you can edit right there. (But I’d probably throw together the skeleton with dummy text and edit it in something with vim emulation.

Michael #2: Wafer-scale Python

  • via Galen Swint
  • Many new processors with the sole purpose of accelerating artificial intelligence and machine learning workloads.
  • Cerebras, a chip company, built an AI-oriented chip that is 12”x12” (30cm^2) with 850,000 AI cores on board.
  • Another way to look at it is that’s 2.6T transistors vs. my M1’s 0.0016T.
  • Built through TSMC, as so many things seem to be these days.
  • What’s the Python angle here? A key to the design is the custom graph compiler, that takes PyTorch or TensorFlow and maps each layer to a physical part of the chip, allowing for asynchronous compute as the data flows through.
  • Shipping soon for just $3M+.

Marlene #3: RAPIDS

  • This is the library I’m currently working on at NVIDIA. I work specifically on CuDF which is a Python GPU DataFrame library for loading, joining, aggregating, filtering, and manipulating tabular data using a DataFrame style API.
  • It mirrors the Pandas API but operations are done on the GPU
  • I gave a talk at PyCon Sweden a few months ago called ‘A Beginners Guide to GPU’s for Pythonista’s’.
  • Here’s an example of how long it takes for pandas vs. cudf to calculate the mean of a group of numbers in a column in a DataFrame:
    #we'll be calculating the mean of the data in a dataframe (table)
    import cudf
    import pandas as pd
    import numpy as np
    import time

    #lets create a data frame using pandas, that has two columns, a and b 
    #we're generating a dataframe where each column contains one hundred million rows
    #each row is filled with a random integer that can be between 0 to one hundred million
    pandas_df = pd.DataFrame({"a": np.random.randint(0, 100000000, size=100000000),
    "b": np.random.randint(0, 100000000, size=100000000)})

    #next we want to create a cudf version of this dataframe
    cudf_df = cudf.DataFrame.from_pandas(pandas_df)

    #now we'll use timeit to compare the time it takes to calculate the mean 
    #of the numbers in the column "a" of the dataframe. 

    #Lets time Pandas
    %timeit pandas_df.a.mean()

    #Lets time CuDF
    %timeit cudf_df.a.mean()

    #These were the results I got (might be a little slower if you're using the notebook on Colab)
    # pandas: 105 ms ± 298 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
    #cudf: 1.83 ms ± 4.51 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
  • You can test this out for right now using the RAPIDS, GPU powered notebook for free on Google Colab.

Brian #4: datefinder and dateutil

    import datefinder
    date_strings = [
        "March 12 2010",
        "2010-03-12",
        "03/12/2010 12:42:12"
    ]
    [list(datefinder.find_dates(d)) for d in date_strings]
    # [[datetime.datetime(2010, 3, 12, 0, 0)],
    #  [datetime.datetime(2010, 3, 12, 0, 0)],
    #  [datetime.datetime(2010, 3, 12, 12, 42, 12)]]
  • Nice focused library, used by 662 projects, according to GitHub
  • datefinder finds dates in strings, then uses dateutil to parse them into datetime objects.
  • dateutil is actually kind of amazing also, great for
    • parsing date strings
    • computing relative delas (next month, last week of the month, etc)
    • relative deltas between date and/or datetimes
    • amazing timezone support
    • comprehensive test suite
    • nice mix of both pytest and unittest. I’ll have to ask Paul Ganssle about that sometime.

Michael #5: Cinder - Instagram's performance oriented fork of CPython

  • via Anthony Shaw
  • Instagram's performance oriented fork of CPython.
  • They use a multi-process webserver architecture; the parent process starts, performs initialization work (e.g. loading code), and forks tens of worker processes to handle client requests.
  • The overhead due to copy-on-write from reference counting long-lived objects turned out to be significant. They developed a solution called "immortal instances" to provide a way to opt-out objects from reference counting.
  • "Shadowcode" or “shadow bytecode" is their inline caching implementation. It observes particular optimizable cases in the execution of generic Python opcodes and (for hot functions) dynamically replaces those opcodes with specialized versions.
  • Eager coroutine evaluation: If a call to an async function is immediately awaited, we immediately execute the called function up to its first await.
  • The Cinder JIT is a method-at-a-time custom JIT implemented in C++. And can achieve 1.5-4x speed improvements on many Python performance benchmarks.
  • Strict modules is a few things rolled into one
  • Static Python is an experimental bytecode compiler that makes use of type annotations to emit type-specialized and type-checked Python bytecode.
  • Static Python plus Cinder JIT achieves 7x the performance of stock CPython on a typed version of the Richards benchmark.

Marlene #6: PyCon US 2021

  • PyCon US starts today. Its the largest gathering of the Python community on earth!
  • I’ll be hosting the Diversity and Inclusion Work Group Meet and Greet. I recently became the chair of this WG, which focuses on helping increase global diversity and inclusion in the python community. We’ll be going live on the main stage at PyCon on Saturday 15 May at 12pm EST. There will be lots of time for discussion, so I hope to see some of you there!
  • I’ll also be hosting the PSF EMEA members meeting, which will be on Saturday at 10am CAT. You can register on the Meet up page or watch the livestream on the PSF Youtube channel. You can also find me in the PSF booth on Friday and Saturday morning, if you’d like to meet there!
  • Some other talks I’m looking forward to attending are:
    • Python Performance at Scale - Making Python Faster at Instagram
    • More Fun With Hardware and CircuitPython - IoT, Wearables, and more!
    • Large Scale Data Validation (with Spark and Dask)
  • Registration will be open all through the conference, so if you haven’t yet you can register here

And of course all the keynotes this year!

Extras

Michael

Brian

Marlene

Joke


Audio Download

Posted on 12 May 2021 | 8:00 am


#232 PyPI in a box and a revolutionary keyboard

Watch the live stream:

Watch on YouTube

About the show

Sponsored by us! Support our work through:

Special guest: Annette Lewis

Brian #1: Sphinx Themes Gallery update

  • Curated and maintained by @pradyunsg and @shirou.
  • I actually don’t know what it looked like before, but this is great.
  • I’m working on my first real Sphinx project, so this is awesome to have.
  • Features:
    • Main image for each theme shows what theme looks like in wide, narrow, and phone layout
    • Demos (click on an image):
      • Main page that shows you
      • quick start: install and config theme name
      • Link to theme documentation
      • Example of Navigation
    • Kitchen sink
      • paragraph level markup
        • including inline, math, meta, blocks, code with sidebars, references, directives, footnotes, and more
        • API documentation example
        • essential if you are using this for documenting code
    • Lists and tables

Michael #2: Mongita - Like SQLite but for MongoDB

  • Mongita is a lightweight embedded document database that implements a commonly-used subset of the MongoDB/PyMongo interface.
  • Instead of being a server, Mongita is a self-contained Python library
  • Mongita can be configured to store its documents either on disk or in memory.
  • This is a great project to contribute to as a new open source person, details.
  • Uses:
    • Embedded database: Mongita is a good alternative to SQLite for embedded applications when a document database makes more sense than a relational one.
    • Unit testing: Mocking PyMongo/MongoDB is a pain. Worse, mocking can hide real bugs. By monkey-patching PyMongo with Mongita, unit tests can be more faithful while remaining isolated.
  • Limited dependencies: Mongita runs anywhere that Python runs. Currently the only dependencies are pymongo (for bson) and sortedcontainers (for faster indexes).

Annette #3: World Plone Day 2021 - Over 50 Videos from 16 Countries

  • World Plone Day was 24-hour online streaming event held on April 28th 2021.
    • Plone open-source Content Management system, written in Python and built on top of the Zope web framework
  • Plone community produced 56 videos totaling 22 hours of content.
  • More than 50 speakers from 16 countries, 11 languages.
  • All available on Youtube - World Plone Day 2021 playlist
  • Variety of content categories:
    • General Interest
    • Technical Talks
    • Case Studies
    • Plone 6
      • Plone 6 introduction
      • How does Plone 6 work under the hood?
      • Getting Started with Volto Customization

Brian #4: The social contract of open source : view every commit as a gift

  • Brett Cannon
  • Interesting thoughts on what “contract” and what relationship exists between maintainer and user.
  • Great analogy of a stack of USB drives with source code on front lawn with a “FREE” sign.
    • Come by and pick up the latest release whenever you want
    • No guarantee at all
    • Each new version is a gift that you can accept or not
    • Receiver of gift should NOT:
    • knock on front door and yell at developer
    • Leave an angry letter in the mailbox
    • Stand in middle of street in town yelling about how much they hate the software or how much of an idiot the developer is
  • Quote from Immanuel Kant: “Act in such a way that you treat humanity, whether in your own person or in the person of any other, never merely as a means to an end, but always at the same time as an end.”
  • Brett: “… when you treat a maintainer as a fellow human being who may be able to do you a favor of their own volition, then you end up in an appropriate relationship where you are not trying to use the maintainer for something specific.
  • Summary: “Every commit of open source code should be viewed as an independent gift from the maintainer that they happened to leave on their front yard for others to enjoy if they so desire; treating them as a means to and for their open source code is unethical.”

Michael #5: PyPI in a box

  • via Jared Chung
  • Connectivity is still a challenge in many countries, especially Africa
  • Vuyisile Ndlovu created PyPI in a Box. Post PyCon Africa, in the conference slack group, attendees shared the most common problems across the continent, and the state of internet connectivity was the overwhelming response.
  • Vuyisile also references putting “StackOverflow in a box” but the article doesn’t lay out how to do it.

Annette #6: Film simulations from scratch using Python

  • by Kevin Martin Jose
  • Implementing applying CLUTs (Color Look up table) to an image with Python
  • Opens the Image with PIL then converts it into numpy array
  • Iterates through all the pixels values and assigns it to LUT color cell
  • Returns the filtered Image from the array

Extras

Michael

Annette

Joke

A developer-focused keyboard (graphic)


Audio Download

Posted on 5 May 2021 | 8:00 am


#231 Go Python, Go!

Watch the live stream:

Watch on YouTube

About the show

Sponsored by us! Support our work through:

Special guests:

Brian #1: For-Else: A Weird but Useful Feature in Python

  • Yang Zhou
  • After a for loop, you can put an else block.
  • The else block only executes when there is no break in the loop. If the loop got all the way to the end, and off the end, the else block will run.
  • First, I’m not used to putting break or else anywhere in my Python code, so I’m also curious why you’d want to do this.
  • Yang explains the feature, then talks about 3 scenarios for use:
    • Iterate and find items without needing a flag variable.
    • break when you find what you are looking for, and the else only runs if you didn’t find it.
    • Help to break out of nested loops
      • I’m still confused by this one
    • Help to handle exceptions
      • Kind of a cool use. try/except in a for loop. Have a break in the except block. Then the else block will be fore code where you know no exceptions were caught.
  • Take away: The first reason wins it for me. I hate it when I feel I need to add a “found” flag to some code. else seems cleaner.
  • Also: Please add comments to else blocks. Many people won’t know how they work, so a short explanation can help tons.

Michael #2: Tortoise ORM

  • Familiar asyncio ORM for python, built with relations in mind
  • I’ve seen this ORM popping up around the async web stories a lot these days
  • Similar to Django’s ORM
  • Tortoise ORM is supported on CPython >= 3.7 for SQLite, MySQL and PostgreSQL.
  • They offer a nice, broad perf comparison on their github page
  • Really nice and clean API for ORM things, again on the github page
  • Tortoise ORM uses Aerich as database migrations tool

Cecil #3: Faster Python with Go Shared objects

  • Leverage Go's standard library and ecosystem in Python
  • Language interop is a good for productivity
  • Passing data is limited to primitive types

Brian #4: Learn by reading code: Python standard library design decisions explained (for advanced beginners)

  • Reading code is a great way to improve your own coding.
  • What code should you read?
    • If it’s great code, you could improve.
    • If it’s scary code, it might not be so good, and might teach you bad practices
  • Python stdlib is there and has some interesting features:
    • all of the code is available
    • PEPs are available so you can read the discussions that went into it while you are reading the code, or before
    • This is huge. Most code you’ll find, even within companies, doesn’t have “why we did this” explanations.
  • However…
    • it is not uniform
    • different authors
    • some is old, and pythonic was different 10-20 years ago
    • lots of code around to preserve backwards compatibility
  • So here’s some recommendations:
    • statistics : code is simple, well documented, PEP has design decisions and comparisons
    • pathlib: good object-oriented example, good comparative study, as you can also read os.path
    • dataclasses: extremely well documented, good example of dataclasses
    • graphlib: does one thing, an implementation of a topological sort algorithm. no PEP, but an issue with a discussion thread that discusses the API decisions

Related: https://devops.com/learning-curve-computer-programming-languages/

Michael #5: Gradio: Create UIs for prototyping your machine learning model in 3 minutes

  • via David Smit
  • Quickly create customizable UI components around your models.
  • Gradio makes it easy for you to "play around" with your model in your browser
  • Drag-and-drop in your own images, pasting your own text, recording your own voice, etc. and seeing what the model outputs.
  • Gradio is useful for:
  • Creating demos of your machine learning code for clients / collaborators / users
  • Getting feedback on model performance from users
  • Debugging your model interactively during development
  • Interfaces can be easily shared publicly by setting share=True in the launch() method.

Cecil #6: Use basketball stats to optimize game play with Visual Studio Code

Extras

Michael

Joke: They said containers would fix it


Audio Download

Posted on 28 April 2021 | 8:00 am


#230 PyMars? Yes! FLoC? No!

Watch the live stream:

Watch on YouTube

About the show

Sponsored by us! Support our work through:

Special guests: Peter Kazarinoff

Brian #1: calmcode.io

  • by Vincent D. Warmerdam
  • Suggested by Rens Dimmendaal
  • Great short intro tutorials & videos. Not deep dives, but not too shallow either.
  • Suggestions:
  • I watched the whole series on datasette this morning and learned how to
    • turn a csv data file into a sqlite database
    • use datasette to open a server to explore the data
    • filter the data
    • visualize the data with datasette-vega plugin and charting options
    • learn how I can run random SQL, but it’s safe because it’s read only
    • use it as an API that serves either CSV or json
    • deploy it to a cloud provider by wrapping it in a docker container and deploying that
    • add user authentication to protect the service
    • explore tons of available data sets that have been turned into live services with datasette

Michael #2: Natural sort (aka natsort)

  • via Brian Skinn
  • Simple yet flexible natural sorting in Python.
  • Python sort algorithm sorts lexicographically
    >>> a = ['2 ft 7 in', '1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '7 ft 6 in']
    >>> sorted(a)
    ['1 ft 5 in', '10 ft 2 in', '2 ft 11 in', '2 ft 7 in', '7 ft 6 in']
  • natsort provides a function natsorted that helps sort lists "naturally”
    >>> natsorted(a)
    ['1 ft 5 in', '2 ft 7 in', '2 ft 11 in', '7 ft 6 in', '10 ft 2 in']
  • Other things that can be sorted:
  • versions
  • file paths (via os_sorted)
  • signed floats (via realsorted)
  • Can go faster using fastnumbers

Peter #3: Python controlling a helicopter on Mars.

Brian #4: Pydantic, FastAPI, Typer will all run on 3.10, 3.11, and into the future

  • suggested by an Angry Dwarf
  • It’s a bit of an emotional roller coaster this last week even for those of us on the sidelines watching. I’m sure it was even more so for those involved.
  • Short version:

    • Pydantic, FastAPI, Typer, etc will continue to run as is in 3.10
    • Minor changes might be necessary in 3.11, but most likely all of us bystanders and users of these packages won’t even see the change, or we will be given specific instructions on what we need to change well ahead of time.
    • If things change in 3.11, your code might still work fine, and you can test that today if you are worried about it.
    • All project leads are involved and talking with the Steering Council.
    • The Steering Council has all of our interests and Pythons in mind and wants to make improvements to Python in a sane way.
    • So don’t freak out. Smart and kind people are involved and know what they are doing.
  • Slightly more detail that I don’t really want to read, and summarized to my perspective:

    • Something about an existing PEP 563, titled Postponed Evaluation of Annotations
    • It was part of 3.7 and it included:
      • “In Python 3.10, function and variable annotations will no longer be evaluated at definition time. Instead, …”
    • This would have implications on Pydantic and projects using it and similar methods, like FastAPI, Typer, …
    • Panic ensues, people wringing their hands, bystanders confused.
    • BTW, the Python steering council knows what they are doing and is aware of all of this already. But lots of people jumped on the bandwagon anyway and freaked out.
    • Even I was thinking “Ugh. I use Typer and FastAPI, can I still use them in 3.10?”
    • Luckily, Sebastian Ramirez posted:
    • I've seen some incorrect conclusions that FastAPI and pydantic "can't be used with Python 3.10". Let's clear that up. In the worst-case scenario (which hasn't been decided), some corner cases would not work and require small refactors.
    • And also if you are worried about the future and your own use as is, you can use from __future__ import annotations to try the new system out. Also thanks Sebastian
    • Then there is this message by Thomas Wouters about PEP 563 and 3.10
    • “The Steering Council has considered the issue carefully, along with many of the proposed alternatives and solutions, and we’ve decided that at this point, we simply can’t risk the compatibility breakage of PEP 563. We need to roll back the change that made stringified annotations the default, at least for 3.10. (Pablo is already working on this.)
    • “To be clear, we are not reverting PEP 563 itself. The future import will keep working like it did since Python 3.7. We’re delaying making PEP 563 string-based annotations the default until Python 3.11. This will give us time to find a solution that works for everyone (or to find a feasible upgrade path for users who currently rely on evaluated annotations). Some considerations that led us to this decision: …”

Michael #5: Extra, Extra, Extra, Extra hear all about it

  • No social trackers on Python Bytes or Talk Python.
  • Python packages on Mars
  • More Mars
  • NordVPN and “going dark”
  • Nobody wants anything to do with Google's new tracking mechanism FLoC (Android Police, Ars Technica). From EFF: Google’s pitch to privacy advocates is that a world with FLoC will be better than the world we have today, where data brokers and ad-tech giants track and profile with impunity. But that framing is based on a false premise that we have to choose between “old tracking” and “new tracking.” It’s not either-or. Instead of re-inventing the tracking wheel, we should imagine a better world without the myriad problems of targeted ads.

Peter #6: Build Python books with Jupyter-Book

  • There are many static site generators for Python: Sphinx, Pelican, MkDocs…
  • Jupyter-Book is a static site generator that makes online books from Jupyter notebooks and markdown files. See the Jupyter-book docs.
  • Books can be published on GitHub pages and there is a GitHub action to automatically re-publish your book with each git push.
  • A gallery of Jupyter-books includes: Geographic Data Science with Python, Quantitative Economics with Python, the UW Data Visualization Curriculum, and a book on Algorithms for Automated Driving. All the books are free an online.

Extras

Brian

  • 2021 South African Pycon, PyConZA - https://za.pycon.org/. The conference will be on 7 and 8 October entirely online
  • deadpendency update . Within a day of us talking about deadpendency last week, the project maintainer added support for pyproject.toml. So projects using poetry, flit should work now. I imagine setuptools with pyproject.toml should also work.

Peter

Joke

More code comments

// Dear future me. Please forgive me.
// I can't even begin to express how sorry I am.
try {
     ...
} catch (SQLException ex) {
     // Basically, without saying too much, you're screwed. Royally and totally.
} catch(Exception ex){
     //If you thought you were screwed before, boy have I news for you!!!
}
// This is crap code but it's 3 a.m. and I need to get this working.

One more:

From TwoHardThings by Martin Fowler: Original saying: There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton Then there’s This tweet.


Audio Download

Posted on 21 April 2021 | 8:00 am