Python Bytes

by Michael Kennedy and Brian Okken

Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.

  

Latest Episodes

#182 PSF Survey is out!

Sponsored by Datadog: pythonbytes.fm/datadog

Michael #1: PSF / JetBrains Survey

  • via Jose Nario
  • Let’s talk results:
  • 84% of people who use Python do so as their primary language [unchanged]
  • Other languages: JavaScript (down), Bash (down), HTML (down), C++ (down)
  • Web vs Data Science languages:
    • More C++ / Java / R / C# on Data Science side
    • More SQL / JavaScript / HTML
  • Why do you mainly use Python? 58% work and personal
  • What do you use Python for?
    • Average answers was 3.9
    • Data analysis [59% / 59% — now vs. last year]
    • Web Development [51% / 55%]
    • ML [40% / 39%]
    • DevOps [39% / 43%]
  • What do you use Python for the most?
    • Web [28% / 29%]
    • Data analysis [18% / 17%]
    • Machine Learning [13% / 11%]
  • Python 3 vs Python 2: 90% Python 3, 10% Python 2
  • Widest disparity of versions (pro 3) is in data science.
  • Web Frameworks:
    • Flask [48%]
    • Django [44%]
  • Data Science
    • NumPy 63%
    • Pandas 55%
    • Matplotlib 46%
  • Testing
    • pytest 49%
    • unittest 30%
    • none 34%
  • Cloud
    • AWS 55%
    • Google 33%
    • DigitalOcean 22%
    • Heroku 20%
    • Azure 19%
  • How do you run code in the cloud (in the production environment)
    • Containers 47%
    • VMs 46%
    • PAAS 25%
  • Editors
    • PyCharm 33%
    • VS Code 24%
    • Vim 9%
  • tool use
    • version control 90%
    • write tests 80%
    • code linting 80%
    • use type hints 65%
    • code coverage 52%

Brian #2: Hypermodern Python

  • Claudio Jolowicz, @cjolowicz
  • An opinionated and fun tour of Python development practices.
  • Chapter 1: Setup
    • Setup a project with pyenv and Poetry, src layout, virtual environments, dependency management, click for CLI, using requests for a REST API.
  • Chapter 2: Testing
    • Unit testing with pytest, using coverage.py, nox for automation, pytest-mock. Plus refactoring, handling exceptions, fakes, end-to-end testing opinions.
  • Chapter 3: Linting
    • Flake8, Black, import-order, bugbear, bandit, Safety. Plus more on managing dependencies, and using pre-commit for git hooks.
  • Chapter 4: Typing
    • mypy and pytype, adding annotations, data validation with Desert & Marshmallow, Typeguard, flake8-annotations, adding checks to test suite
  • Chapter 5: Documentation
    • docstrings, linting docstrings, docstrings in nox sessions and test suites, darglint, xdoctest, Sphinx, reStructuredText, and autodoc
  • Chapter 6: CI/CD
    • CI with GithHub Actions, reporting coverage with Codecov, uploading to PyPI, Release Drafter for release documentation, single-sourcing the package version, using TestPyPI, docs on RTD
  • The series is worth it even for just the artwork.
  • Lots of fun tools to try, lots to learn.

Michael #3: Open AI Jukebox

  • via Dan Bader
  • Listen to the songs under “Curated samples.”
  • A neural net that generates music, including rudimentary singing, as raw audio in a variety of genres and artist styles.
  • Code is available on github.
  • Dataset: To train this model, we crawled the web to curate a new dataset of 1.2 million songs (600,000 of which are in English), paired with the corresponding lyrics and metadata from LyricWiki.
  • The top-level transformer is trained on the task of predicting compressed audio tokens. We can provide additional information, such as the artist and genre for each song.
  • Two advantages: first, it reduces the entropy of the audio prediction, so the model is able to achieve better quality in any particular style; second, at generation time, we are able to steer the model to generate in a style of our choosing.

Brian #4: The Curious Case of Python's Context Manager

  • Redowan Delowar, @rednafi
  • A quick tour of context managers that goes deeper than most introducitons.
  • Writing custom context managers with __init__, __enter__, __exit__.
  • Using the decorator contextlib.contextmanager
  • Then it gets even more fun
    • Context managers as decorators
    • Nesting contexts within one with statement.
    • Combining context managers into new ones
  • Examples
    • Context managers for SQLAlchemy sessions
    • Context managers for exception handling
    • Persistent parameters across http requests

Michael #5: nbstripout

  • via Clément Robert
  • In the latest episode, you praised NBDev for having a git hook that strips out notebook outputs.
  • strip output from Jupyter and IPython notebooks
  • Opens a notebook, strips its output, and writes the outputless version to the original file.
  • Useful mainly as a git filter or pre-commit hook for users who don’t want to track output in VCS.
  • This does mostly the same thing as the Clear All Output command in the notebook UI.
  • Has a nice youtube tutorial right in the pypi listing
  • Just do nbstripout --``install in a git repo!

Brian #6: Write ups for The 2020 Python Language Summit

Also, another way to get involved is to become a member of the PSF board of directors

Extras:

Michael:

  • Updated search engine for better result ranking
  • Windel Bouwman wrote a nice little script for speedscope https://github.com/windelbouwman/pyspeedscope (follow up from Austin profiler)

Jokes:

  • “Due to social distancing, I wonder how many projects are migrating to UDP and away from TLS to avoid all the handshakes?” - From Sviatoslav Sydorenko
  • “A chef and a vagrant walk into a bar. Within a few seconds, it was identical to the last bar they went to.” - From Benjamin Jones, crediting @lufcraft
  • Understanding both of these jokes is left as an exercise for the reader.


Audio Download

Posted on 19 May 2020 | 8:00 am


#181 It's time to interrogate your Python code

Sponsored by Datadog: pythonbytes.fm/datadog


Brian #1: interrogate: checks your code base for missing docstrings

  • Suggested by Herbert Beemster
  • Written and Maintained by Lynn Root, @roguelynn
  • Having docstrings helps you understand code.
  • They can be on methods, functions, classes, and modules, and even packages, if you put a docstring in __init__.py files.
  • I love how docstrings pop up in editors like VS Code & PyCharm do with them. If you hover over a function call, a popup shows up which includes the docstring for the function.
  • Other tools like Sphinx, pydoc, docutils can generate documentation with the help of docstrings.
  • But good is your project at including docstrings?
  • interrogate is a command line tool that checks your code to make sure everything has docstrings. Neato.
  • What’s missing? -vv will tell you which pieces are covered and not.
  • Don’t want to have everything forced to include docstrings? There are options to select what needs a docstring and what doesn’t.
  • Also can be incorporated into tox testing, and CI workflows.

Michael #2: Streamlit: Turn Python Scripts into Beautiful ML Tools

  • via Daniel Hoadley
  • Many folks come to Python from “scripting” angles
  • The gap between that and interactive, high perf SPA web apps is gigantic
  • Streamlit let’s you build these as if they were imperative top-to-bottom code
  • Really neat tricks make callbacks act like blocking methods
  • Use existing data science toolkits

Brian #3: Why You Should Document Your Tests

  • Hynek Schlawack, @hyneck
  • All test_ methods should include a docstring telling you or someone else the what and why of the test.
  • The test name should be descriptive, and the code should be clear. But still, you can get confused in the future.
  • Hynek includes a great example of a simple test that is not obvious what it’s doing because the test is checking for a side effect of an action.
  • “This is quite common in testing: very often, you can’t ask questions directly. Instead you verify certain properties that prove that your code is achieving its goals.”
  • “If you don’t explain what you’re actually testing, you force the reader (possibly future you) to deduce the main intent by looking at all of its properties. This makes it tiring and time-consuming to quickly scan a file for a certain test or to understand what you’ve actually broken if a test starts failing.”
  • Want to make sure all of your test methods have docstrings?
    • interrogate -vv --fail-under 100 --whitelist-regex "test_.*" tests will do the trick.
  • See also: How to write docstrings for tests

Michael #4: HoloViz project

  • HoloViz is a coordinated effort to make browser-based data visualization in Python easier to use, easier to learn, and more powerful.
  • HoloViz provides:
    • High-level tools that make it easier to apply Python plotting libraries to your data.
    • A comprehensive tutorial showing how to use the available tools together to do a wide range of different tasks.
    • A Conda metapackage "holoviz" that makes it simple to install matching versions of libraries that work well together.
    • Sample datasets to work with.
  • Comprised of a bunch of cool independent projects
  • Panel for making apps and dashboards for your plots from any supported plotting library
  • hvPlot to quickly generate interactive plots from your data
  • HoloViews to help you make all of your data instantly visualizable
  • GeoViews to extend HoloViews for geographic data
  • Datashader for rendering even the largest datasets
  • Param to create declarative user-configurable objects
  • Colorcet for perceptually uniform colormaps.

Brian #5: A cool new progress bar for python

  • Rogério Sampaio, @rsalmei
  • project: alive-progress
  • Way cool CLI progress bars with or without spinners
  • Clean coding interface.
  • Fun features and options like sequential framing, scrolling, bouncing, delays, pausing and restarting.
  • Repo README notes:
    • Great animations in the README. (we love this)
    • “To do” list, encourages contributions
    • “Interesting facts”
      • functional style
      • extensive use of closures and generators
      • no dependencies
  • “Changelog highlights”
    • I love this. 1-2 lines of semicolon separated features added per version.

Michael #6: Awesome Panel

  • by Marc Skov Madsen
  • Awesome Panel Project is to share knowledge on how awesome Panel is and can become.
  • A curated list of awesome Panel resources.
  • A gallery of awesome panel applications.
  • This app as a best practice multi page app with a nice layout developed in Panel.
  • Kind of meta as it’s built with Panel. :)
  • Browse the gallery to get a sense of what it can do

Extras:

Michael:

Brian:


Joke:

O’Really book covers


Audio Download

Posted on 14 May 2020 | 8:00 am


#180 Transactional file IO with Python and safer

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean - $100 credit for new users to build something awesome.

Michael #1: Ubuntu 20.04 is out!

  • Next LTS support version since 26th April 2018 (18.04).
  • Comes with Python 3.8 included!
  • Already upgraded all our servers, super smooth.
  • Kernel has been updated to the 5.4 based Linux kernel, with additional support for Wireguard VPN, AUFS5, and improved support for IBM, Intel, Raspberry Pi and AMD hardware.
  • Features the latest version of the GNOME desktop environment.
  • Brings support for installing an Ubuntu desktop system on top of ZFS.
  • 20.04 already an option on DigitalOcean ;)

Brian #2: Working with warnings in Python

  • (Or: When is an exception not an exception?)
  • Reuven Lerner
  • Exceptions, the class hierarchy of exceptions, and warnings.
  • “… most of the time, warnings are aimed at developers rather than users. Warnings in Python are sort of like the “service needed” light on a car; the user might know that something is wrong, but only a qualified repairperson will know what to do. Developers should avoid showing warnings to end users.”
  • Python’s warning system …:
    • It treats the warnings as a separate type of output, so that we cannot confuse it with either exceptions or the program’s printed text,
    • It lets us indicate what kind of warning we’re sending the user,
    • It lets the user indicate what should happen with different types of warnings, with some causing fatal errors, others displaying their messages on the screen, and still others being ignored,
    • It lets programmers develop their own, new kinds of warnings.
  • Reuven goes on to show how to use warnings in your code.
    • using them
    • creating custom warnings
    • filtering

Michael #3: Safer file writer

    with open(filename, 'w') as fp:
        json.dump(data, fp)
  • It’s using with, so it’s good right?
  • Well the file itself may be overwritten and maybe corrupted
  • With safer, you write almost identical code:
with safer.open(filename, 'w') as fp:
    json.dump(data, fp)

Brian #4: codespell

  • codespell : Fix common misspellings in text files. It's designed primarily for checking misspelled words in source code, but it can be used with other files as well.
  • I got a cool pull request against the cards project to add a pre-commit hook to run codespell. (Thanks Christian Clauss)
  • codespell caught a documentation spelling error in cards, where I had spelled “arguments” as “arguements”. Oops.
  • Spelling errors are annoying and embarrassing in code and comments, and distracting. Also hard to deal with using traditional spell checkers. So super glad this is a thing.

Michael #5: Austin profiler

  • via Anthony Shaw
  • Python frame stack sampler for CPython
  • Profiles CPU and Memory!
  • Why Austin?
    • Written in pure C Austin is written in pure C code. There are no dependencies on third-party libraries.
    • Just a sampler - fast: Austin is just a frame stack sampler. It looks into a running Python application at regular intervals of time and dumps whatever frame stack it finds.
    • Simple output, powerful tools Austin uses the collapsed stack format of FlameGraph that is easy to parse. You can then go and build your own tool to analyse Austin's output.
    • You could even make a player that replays the application execution in slow motion, so that you can see what has happened in temporal order.
    • Small size Austin compiles to a single binary executable of just a bunch of KB.
    • Easy to maintain Occasionally, the Python C API changes and Austin will need to be adjusted to new releases. However, given that Austin, like CPython, is written in C, implementing the new changes is rather straight-forward.
  • Creates nice flame graphs
  • The Austin TUI is nice! Austin TUI

  • Web Austin is yet another example of how to use Austin to make a profiling tool. It makes use of d3-flame-graph to display a live flame graph in the web browser that refreshes every 3 seconds with newly collected samples.

  • Austin output format can be converted easily into the Speedscope JSON format. You can find a sample utility along with the TUI and Austin Web.

Brian #6: Numbers in Python

  • Moshe Zadka
  • A great article on integers, floats, fractions, & decimals
  • Integers
    • They turn into floats very easily, (4/3)*34.0, int → float
  • Floats
    • don’t behave like the floating point numbers in theory
    • don’t obey mathematical properties
      • subtraction and addition are not inverses
        • 0.1 + 0.2 - 0.2 - 0.1 != 0.0
      • addition is not associative
    • My added comment: Don’t compare floats with ==, use pytest.approx or other approximation techniques.
  • Fractions
    • Kinda cool that they are there but be very careful about your input
    • Algorithms on fractions can explode in time and to some extent memory.
    • Generally better to use floats
  • Decimals
    • Good for financial transactions.
    • Weird dependence on a global state variable, the context precision.
    • Safer to use a local context to set the precision locally
    >>> with localcontext() as ctx:
    ...     ctx.prec = 10
    ...     Decimal(1) / Decimal(7)
    ...
    Decimal('0.1428571429')

Extras:

Brian:

Michael:

Joke:

Unix is user friendly. It's just very particular about who its friends are. (via PyJoke)

If you put 1000 monkeys at 1000 computers eventually one will write a Python program. The rest will write PERL. (via @JamesAbel)


Audio Download

Posted on 8 May 2020 | 8:00 am


#179 Guido van Rossum drops in on Python Bytes

Sponsored by DigitalOcean: pythonbytes.fm/datadog

Special guest: Guido van Rossum


Brian #1: New governance model for the Django project

  • James Bennet on DjangoProject Blog
  • DEP 10 (Django Enhancement Proposal)
  • Looks like it’s been in the making since at least 2018
  • The specifics are definitely interesting
    • “core team” dissolved
    • new role, “merger” with commit access only for merging pull requests.
      • hold no decision making privileges
    • technical decisions made in public venues
    • “technical board” kept where necessary, but historically it’s rare.
      • no longer elected by committers, but anyone can run and be elected by DSF individual members.
  • More interesting to me is the rationale
    • Grow the set of people contributing to Django
    • Remove the barriers to participation
    • Looking at how decisions are made anyway historically, by reviewing pull requests, and merges done by “Fellows”, paid contractors of the DSF.
  • Specifically, taking into account the specifics of the current state of participation in Django, trying to set it up for inclusion and growth in the future, and the specifics of this project. Not trying to clone the governance of a different project.

Michael #2: missingno

  • Missing data visualization module for Python.
  • A small toolset of flexible and easy-to-use missing data visualizations
  • Quick visual summary of the completeness (or lack thereof) of your dataset
  • Just call msno.matrix(collisions.sample(250)) and here’s what you’ll see:

  • The sparkline at right summarizes the general shape of the data completeness and points out the rows with the maximum and minimum nullity in the dataset.
  • Other visualizations are available (heat maps, bar charts, etc)
  • The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.
  • The dendrogram uses a hierarchical clustering algorithm (courtesy of scipy) to bin variables against one another by their nullity correlation.

Guido #3: Announcements from the language summit.


Brian #4: Codes of Conduct and Enforcement

  • I’ve been thinking about this a lot lately. No reason. Just interesting topic, I think.
  • Interesting the differences in CoC and enforcement clauses of different projects based on the types of interaction most likely to need enforcement.
  • Two examples
    • PSF
      • Scope (focus seems to be first on events, second on online)
      • PSF Code of Conduct
        • being open
        • focus on what’s best for the community
        • acknowledging time and effort
        • being respectful of different viewpoints and experiences
        • showing empathy towards other community members
        • being considerate
        • being respectful
        • gracefully accepting constructive criticism
        • using welcoming and inclusive language
          • list of inappropriate behavior
    • PSF CoC Enforcement Procedures
      • 2/3 majority vote among non conflicted work group members.
      • Process for disagreement of the work group
    • Django
      • Scope (focus on online spaces, events seem to be covered elsewhere)
      • Django Code of Conduct
        • be friendly and patient
        • be welcoming
        • be considerate
        • be respectful
        • be careful in the words you choose
        • Includes examples of harassment and exclusionary behavior that isn’t acceptable.
      • when we disagree try to understand why
    • Django CoC Enforcement Manual
      • Resolution timelines in place. Aiming for resolution within a week.
      • Unilateral authority: Any committee member may act immediately (before consensus) to end the situation if the act is ongoing or threatening.
      • Otherwise, consensus must be reached.
      • Otherwise, it’s turned over to the DSF board for resolution.
  • Differences are interesting
    • The focus on online interactions and the Django push to try to get more people involved I think are part of the need for really fast reaction times for problems, and then trying to reach consensus.
    • The ability to bump the decision up to the DSF is interesting too.
    • Also the 2/3 vs consensus.
  • For other projects
    • Looking at these two examples, why they are different, and what similarities and needs for inclusion and growth of more developers, online vs events, etc, before deciding how to enforce CoC on your project.
    • Enforcement and quick enforcement and public statement of what enforcement looks like seems really important. Don’t ignore it. Figure out the process before you have to use it.

Michael #5: Myths about Indentation

  • Python can come across as a funky language using spacing, not { } for code blocks
  • So let’s talk about some myths
  • #1 Whitespace is significant in Python source code.
    • No, not in general. Only the indentation level of your statements is significant (i.e. the whitespace at the very left of your statements).
    • Everywhere else, whitespace is not significant and can be used as you like, just like in any other language.
    • The exact amount of indentation doesn't matter at all, but only the relative indentation of nested blocks (relative to each other).
    • Furthermore, the indentation level is ignored when you use explicit or implicit continuation lines.
    # For example:
    >>> foo = [
    ...            'some string',
    ...         'another string',
    ...           'short string'
    ... ]
  • #2 Python forces me to use a certain indentation style
    • Yes and no. You can write the inner block all on one line if you like, therefore not having to care about indentation at all. These are equivalent
    >>> if 1 + 1 == 2:
    ...     print("foo")
    ...     print("bar")
    ...     x = 42

    >>> if 1 + 1 == 2:
    ...     print("foo"); print("bar"); x = 42

    >>> if 1 + 1 == 2: print("foo"); print("bar"); x = 42 
  • If you decide to write the block on separate lines, then yes, Python forces you to obey its indentation rules
  • The conclusion is: Python forces you to use indentation that you would have used anyway, unless you wanted to obfuscate the structure of the program.
  • Seen C code like this:
if (some condition)
        if (another condition)
                do_something(fancy);
else
        this_sucks(badluck); 
  • Either the indentation is wrong, or the program is buggy. In Python, this error cannot occur. The program always does what you expect when you look at the indentation.
  • #3 You cannot safely mix tabs and spaces in Python
    • That's right, and you don't want that.
    • Most good editors support transparent translation of tabs, automatic indent and dedent.
    • It's behaving like you would expect a tab key to do, but still maintaining portability by using spaces in the file only. This is convenient and safe.
  • #4 I just don't like it - That's perfectly OK; you're free to dislike it - But it does have a lot of advantages, and you get used to it very quickly when you seriously start programming in Python.
  • #5 How does the compiler parse the indentation
    • The parsing is well-defined and quite simple.
    • Basically, changes to the indentation level are inserted as tokens into the token stream.
    • After the lexical analysis (before parsing starts), there is no whitespace left in the list of tokens (except possibly within string literals, of course). In other words, the indentation is handled by the lexer, not by the parser.

Guido #6: Parsers and LibCST

- https://github.com/Instagram/LibCST


Extras:

Michael:

  • Django no longer supports Python 2 AT ALL (via Adam (Codependent Codr)). April 1st this year, the 1.11 line of Django has left Long Term Support (LTS). Leaving only 2.2.12+ with exclusively Python 3 support.
  • Quick follow up on “Coding is Googling”. I went through a recent blip of mad googling.

Brian:

  • Gotta get my talk recorded this week, deadlines Friday. A little worried. As a writer and developer, me and deadlines don’t always see eye to eye.
  • Follow-ups from previous episodes:
    • Got lots of help with my Mac / Windows problem and modifier keys. Thanks everyone. Simplest solution Apple→System Prefs→Keyboard→Modifier Keys, and swap control and command for my external keyboard. So far, so good.
    • You can’t use the setuptools_scm trick to get github actions to automatically publish to Test PyPI or PyPI for Flit or Poetry projects, since the version number is a simple string in the repo. Would love to hear if anyone has a solution to this one. Otherwise I’m fine with a make or tox snippet for publishing that combines bumping the version.

Guido:

  • PyCon goes online.
  • Python 2.7.18 was released, the last Python 2 release ever.

Joke:

Via https://twitter.com/derchambers/status/1226760532763410432

How can you borrow more money at the same time? With asyncIOUs!


Audio Download

Posted on 30 April 2020 | 8:00 am


#178 Build a PyPI package from a Jupyter notebook

This episode is brought to you by Digital Ocean: pythonbytes.fm/digitalocean

YouTube is going strong over at pythonbytes.fm/youtube

Michael #1: Python String Format Website

  • by Lachlan Eagling
  • Have you ever forgotten the arguments to datetime.str``f``time()?
  • Quick: What’s the format for Wed April 15, 10:30am?
  • I don’t know but the site says '%a %B %H, %M:%Sam' and it’s right!

Brian #2: Pandas-Bokeh

  • Suggested by Jack McKew
  • “Pandas-Bokeh provides a Bokeh plotting backend for Pandas, GeoPandas and Pyspark DataFrames, similar to the already existing Visualization feature of Pandas. Importing the library adds a complementary plotting method plot_bokeh() on DataFrames and Series.
  • “With Pandas-Bokeh, creating stunning, interactive, HTML-based visualization is as easy as calling: df.plot_bokeh()"
  • You can also switch the default plotting of pandas to Bokeh with pd.set_option('plotting.backend', 'pandas_bokeh')
  • This interface looks a lot easier to me, instead of frames and plots and shows and such.
  • Lots of options, and all collected in parameters to the plot call.
  • Can also export a notebook or a standalone html file.
  • Plus, the combined install of pip install pandas-bokeh pulls in everything you need.

Michael #3: NBDev

  • nbdev is a library that allows you to fully develop a library in Jupyter Notebooks, putting all your code, tests and documentation in one place.
  • That is: you now have a true literate programming environment, as envisioned by Donald Knuth back in 1983!
  • This seems to be a massive upgrade for notebooks and related tooling
  • Creates Python packages out of a notebook
  • Creates documentation from the notebook
  • Solves the git perma-conflict issues with git pre-commit hooks
  • Use #export to declare a cell should become a function in the package
  • Manages the boilerplate issues for creating Python packages (setup.py, etc)
  • Makes testing possible inside notebooks
  • Navigate and edit your code in a standard text editor or IDE, and sync any changes automatically back into your notebooks (reverse basically)
  • Follow getting started instructions.
  • Docs render slightly better at nbdev.fast.ai

Brian #4: Stop naming your python modules “utils”

  • Sebastian Buczyński, @EnforcerPL
  • Lots of projects, public and private, end up having a utils.py.
  • utils is arguably one of the worst names for a module because it is very blurry and imprecise. Such a name does not say what is the purpose of code inside. On the contrary, a utils module can as well contain almost anything. By naming a module utils, a software developer lays down perfect conditions for an incohesive code blob. Since the module name does not hint team members if something fits there or not, it is likely that unrelated code will eventually appear there, as more utils.”
  • one occurrence of misbehavior invites more of them
    • I have seen this in action. I’ve put 2-3 hard to classify methods, but used in lots of modules, into a utils.py, only to come back in a few months and see a couple dozen completely unrelated methods, now that the team has a junk drawer to throw things in.
  • Excuses:
    • It’s just one function
    • There is no other place to put this code
    • I need a place for company commons
    • But Django does it
  • Instead:
    • Try naming based on role of the code or group functions by theme.
    • If you see a utils.py crop up in a code review, request that it be renamed or split and renamed.

Michael #5: Scalene

  • A high-performance, high-precision CPU and memory profiler for Python
  • It runs orders of magnitude faster than other profilers while delivering far more detailed information.
  • Scalene is fast. It uses sampling instead of instrumentation or relying on Python's tracing facilities. Its overhead is typically no more than 10-20% (and often less).
  • Scalene is precise. Unlike most other Python profilers, Scalene performs CPU profiling at the line level, pointing to the specific lines of code that are responsible for the execution time in your program.
  • Scalene separates out time spent running in Python from time spent in native code (including libraries).
  • Scalene profiles memory usage. In addition to tracking CPU usage, Scalene also points to the specific lines of code responsible for memory growth. It accomplishes this via an included specialized memory allocator.
    • Requires special install, not just pip (see brew install instructions for the docs)
  • Scalene profiles copying volume, making it easy to spot inadvertent copying, especially due to crossing Python/library boundaries (e.g., accidentally converting numpy arrays into Python arrays, and vice versa).
  • See the performance comparison chart.
  • Would be nice to have integrated in the editors (PyCharm and VS Code)

Brian #6: From 1 to 10,000 test cases in under an hour: A beginner's guide to property-based testing

  • Carolyn Stransky, @carolynstran
  • Excellent intro to property based testing and hypothesis
  • Starts with a unit test that uses example based testing.
  • Before showing similar test using hypothesis, she talks about the different mindset of testing for properties instead of exact examples.
    • Like not the exact sorted list you should
    • but instead,
      • the length should be the same
      • the contents should contain the same things, for instance, using set for that assertion
      • you could element-wise walk the list and make sure i <= i+1
  • She walks through the hypothesis decorators to come up with input and shows how to use some.lists and some.integers and max_examples
  • Goes on to discuss coming up with properties to test for, which really is the hard part of property based testing.
  • Checking for expected exceptions
  • Using a naive method technique, useful in property based testing, to compare two versions of a method. This is super useful for refactoring and testing new vs old versions on tons of input data.
  • json5 lib

Extras

Joke

PyJoke delivers:

How many QAs does it take to change a lightbulb? They noticed that the room was dark. They don't fix problems, they find them.


Audio Download

Posted on 22 April 2020 | 8:00 am


#177 Coding is 90% Google searching or is it?

Sponsored by Datadog: pythonbytes.fm/datadog

We’re launching a YouTube Project: pythonbytes.fm/youtube

Brian #1: Announcing a new Sponsorship Program for Python Packaging

  • “The Packaging Working Group of the Python Software Foundation is launching an all-new sponsorship program to sustain and improve Python's packaging ecosystem. Funds raised through this program will go directly towards improving the tools that your company uses every day and sustaining the continued operation of the Python Package Index.”
  • Improvements since 2017, as a result of one time grants, a contract, and a gift:
    • relaunch PyPI in 2018
    • added security features in 2019
    • improve support for users with disabilities and multiple locales in 2019
    • security features in 2019, 2020
    • pip & dependency resolver in 2020
  • Let’s keep it going
    • We use PyPI every day
    • We need packaging to keep getting better
  • You, and your company, can sponsor. View the prospectus, apply to sponsor, or ask questions.
  • Individuals can also donate.

Michael #2: energy-usage

  • A Python package that measures the environmental impact of computation.
  • Provides a function to evaluate the energy usage and related carbon emissions of another function.
  • Emissions are calculated based on the user's location via the GeoJS API and that location's energy mix data (sources: US E.I.A and eGRID for the year 2016).
  • Can save report to PDF, run silently, etc.
  • Only runs on Linux

Brian #3: Coding is 90% Google Searching — A Brief Note for Beginners

  • Colin Warn
  • Short article, mostly chosen to discuss the topic.
  • Michael & Brian disagree, so, what’s wrong with this statement?

Michael #4: Using WSL to Build a Python Development Environment on Windows

  • Article by Chris Moffet
  • VMs aren’t fair to Windows (or macOS or …)
  • But you need to test on linux-y systems! Enter WSL.
  • In 2016, Microsoft launched Windows Subsystem for Linux (WSL) which brought robust unix functionality to Windows.
  • May 2019, Microsoft announced the release of WSL 2 which includes an updated architecture that improved many aspects of WSL - especially file system performance.
  • Check out Chris’ article for
    • What is WSL and why you may want to install and use it on your system?
    • Instructions for installing WSL 2 and some helper apps to make development more streamlined.
    • How to use this new capability to work effectively with python in a combined Windows and Linux environment.
  • The main advantage of WSL 2 is the efficient use of system resources.
  • Running a very minimal subset of Hyper-V features and only using minimal resources when not running.
  • Takes about 1 second to start.
  • The other benefit of this arrangement is that you can easily copy files between the virtual environment and your base Windows system.
  • Get the most out of this with VS Code +

Brian #5: A Pythonic Guide to SOLID Design Principles

  • Derek D
  • Again, mostly including this as a discussion point
  • But for reference, here’s the decoder
    • Single Responsibility Principle
      • Every module/class should only have one responsibility and therefore only one reason to change.
    • Open Closed Principle
      • Software Entities (classes, functions, modules) should be open for extension but closed to change.
    • Liskov's Substitutability Principle
      • If S is a subtype of T, then objects of type T may be replaced with objects of Type S.
    • Interface Segregation Principle
      • A client should not depend on methods it does not use.
    • Dependency Inversion Principle
      • High-level modules should not depend on low-level modules. They should depend on abstractions and abstractions should not depend on details, rather details should depend on abstractions.

Michael #6: Types for Python HTTP APIs: An Instagram Story

  • Let’s talk about Typed HTTP endpoints
  • Instagram has a few (thousand!) on a single Django app
  • We can have data access layers with type annotations, but how do these manifest in HTTP endpoints?
  • Instagram has a cool api_view decorator to “upgrade” regular typed methods to HTTP endpoints.
  • For data exchange, dataclasses are nice, they have types, they have type validation, they are immutable via frozen.
  • But some code is old and crusty, so TypedDict out of mypy allows raw dict usage with validation still.
  • OpenAPI can be used for very nice documentation generation.
  • Comments are super interesting. Suggesting pydantic, fastapi, and more. But that all ignores the massive legacy code story.
  • But one is helpful and suggests Schemathesis: A tool for testing your web applications built with Open API / Swagger specifications.

Extras:

Michael:

Joke:

"How many programmers does it take to kill a cockroach? Two: one holds, the other installs Windows on it."


Audio Download

Posted on 16 April 2020 | 8:00 am


#176 How python implements super long integers

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean

Topic #0: Quick chat about COVID 19

Brian #1: What the heck is pyproject.toml?

  • Brett Cannon
  • pyproject.toml
    • PEP 517 and 518 define what this file looks like and how to use it to build projects
  • We’re familiar with it being used for flit and poetry based projects.
  • Not so much with setuptools, but it does work with setuptools.
  • You can add configuration for non-build related activities, such as coverage, tox, even though those tools support their own config files.
  • Black is gaining popularity, probably more so than the use of flit.
    • Black only uses pyproject.toml for configuration (what little config is available. But there is some.)
  • So. Project adds use of black, ends up configuring with with pyproject.toml, but not specifying build steps, No builds are broken. :(
  • Brett has the answers.
  • Add the following to pyproject.toml. Then go read the rest of Brett’s article. It’s good.
    [build-system]
    requires = ["setuptools >= 40.6.0", "wheel"]
    build-backend = "setuptools.build_meta"

Michael #2: Awesome Python Bytes Awesome List

  • By Jack McKew
  • Will be adding to this repo whenever I hear about awesome packages (in my opinion), PRs are welcome for anyone else though!
  • Already has 5 PRs accepted
  • Comes with graphics!!! Like all good presentations should.
  • Some fun projects this made me recall:
    • Great Expectations - for validating, documenting, and profiling, your data
    • pandas-vet - a plugin for flake8 that provides opinionated linting for pandas code.
    • GeoAlchemy - Using SQLAlchemy with Spatial Databases.
    • vue.py - Provides Python bindings for Vue.js. It uses brython to run Python in the browser.
  • Remember we have speedy search for our content over at pythonbytes.fm/search

Brian #3: Publishing package distribution releases using GitHub Actions CI/CD workflows

  • PyPA
  • You’ve moved to flit (or not) and started using GitHub actions to build and test whenever you push to GitHub. So awesome.
  • But now, there’s still a manual step to remember to publish to PyPI.
  • And maybe we should be checking publish more often with the Test PyPI server.
  • This article is a step by step walkthrough.
  • It’s a bit dated, 3.7. So I’m trying to walk through all the steps with my cards project and it will be finished by the time this episode goes live.
  • Stumbling blocks right now:
    • I’ve left my email blank, no email for author or maintainer in pyproject.toml, because neither flit, nor pip require it. But PyPI still does. grrrr.
      • Trying to decide between: normal email, setting up a new email for it, using a me+pypi gmail alias, setting up a new email address just for pypi, etc.
    • test pypi fails due to “file already exists”, so, that’s always gonna be the case unless I bump the version, so gonna have to try to figure out a way around that.

Michael #4: Rich text for terminals

  • Rich is a Python library for rich text and beautiful formatting in the terminal.
  • Add colorful text (up to 16.7 million colors) with styles (bold, italic, underline etc.) to your script or application.
  • Rich can also render pretty tables, progress bars, markdown, syntax highlighted source code, and tracebacks -- out of the box.
  • Centered or justified text
  • Tables, tables!
  • Syntax highlighted code
  • Markdown!
  • Can replace print() and does pretty printing of dictionaries with color.
  • Good Windows support for the new Windows Terminal

Brian #5: psutil: Cross-platform lib for process and system monitoring in Python

  • “psutil (process and system utilities) is a cross-platform library for retrieving information on running processes and system utilization (CPU, memory, disks, network, sensors) in Python. It is useful mainly for system monitoring, profiling and limiting process resources and management of running processes. It implements many functionalities offered by classic UNIX command line tools such as ps, top, iotop, lsof, netstat, ifconfig, free and others.”
  • Useful for an incredible amount of information about the system you are running on:
    • cpu times, stats, load, number of cores
    • memory size and usage
    • disk partitions, usage
    • sensors, including battery
    • users
    • processes and process management
      • getting ids, names, etc.
      • cpu, memory, connections, files, threads, etc per process
      • signaling processes, like suspend, resume, kill

Michael #6: How python implements super long integers

  • by Arpit Bhayani
  • In C, you worry about picking the right data type and qualifiers for your integers; at every step, you need to think if int would suffice or should you go for a long or even higher to a long double.
  • In python, you need not worry about these "trivial" things because python supports integers of arbitrary size.
  • 2 ** 20000 in C is INF where as in Python’s it’s fine, just at 6,021 digit result. But how!?!
  • Integers are represented as:
    typedef struct {
        PyObject ob_base;
        Py_ssize_t ob_size; /* Number of items in variable part */
    } PyVarObject;
  • Other types that has PyObject_VAR_HEAD are
    • PyBytesObject
    • PyTupleObject
    • PyListObject
    # Python's number:
    struct _longobject {
        PyObject ob_base;
        Py_ssize_t ob_size; /* Number of items in variable part */
        digit ob_digit[1];
    };
  • A "digit" is base 230 hence if you convert 1152921504606846976 into base 230 you get 100
  • Operations on super long integers
    • Addition: Integers are persisted "digit-wise", this means the addition is as simple as what we learned in the grade school
    • Subtraction: Same
    • Multiplication: In order to keep things efficient implements the Karatsuba algorithm that multiplies two n-digit numbers in O(nlog23) elementary steps.
  • Optimization of commonly-used integers: Python preallocates small integers in a range of -5 to 256. This allocation happens during initialization

Extras:

Michael:

  • We're coming to YouTube, probably. :)
  • npm is joining GitHub

Joke:


Audio Download

Posted on 7 April 2020 | 8:00 am


#175 Python string theory with superstring.py

Sponsored by Datadog: pythonbytes.fm/datadog

Special Guest: Matt Harrison

Topic #0: Quick chat about COVID 19.

  • What does your world look like?
  • Amusing to see news channels, daily shows, etc, learning what we podcasters have figured out years ago

Brian #1: Dictionary Merging and Updating in Python 3.9

  • Yong Cui, Ph.D.
  • Python 3.9, scheduled for Oct release, will introduce new merge (|) and update (|=) operators, a.k.a. union operators
  • Available in alpha 4 and later
  • see also pep 584
    # merge
    d1 = {'a': 1, 'b': 2}
    d2 = {'c': 3, 'd': 4}
    d3 = d1 | d2
    # d3 is now {'a': 1, 'b': 2, 'c': 3, 'd': 4}

    # update
    d1 = {'a': 1, 'b': 2}
    d1 |= {'c': 3, 'd': 4}
    # d1 is now {'a': 1, 'b': 2, 'c': 3, 'd': 4}

    # last one wins if contention for both | and |=
    d1 = {'a': 1, 'b': 2}
    d1 |= {'a': 10, 'c': 3, 'd': 4}
    # d1 is now {'a': 10, 'b': 2, 'c': 3, 'd': 4}

Matt #2: superstring

  • An efficient library for heavy-text manipulation in Python, that achieves a remarkable memory and CPU optimization.
  • Uses Rope (data structure) and optimization techniques.
  • Performance comparisons for 50,000 char text
    • memory: 1/20th
    • speed: 1/5th
  • Features
    • Fast and Memory-optimized
    • Rich API
      • concatenation (a + b)
      • len() and .length()
      • indexing
      • slicing
      • strip
      • lower
      • upper
  • Similar functionalities to python built-in string
  • Easy to embed and use.
  • I wonder if any of these optimizations could be brought into CPython
  • Beware, it’s lacking tests

Michael #3: New pip resolver to roll out this year

  • via PyCoders
  • The developers of pip are in the process of developing a new resolver for pip (as announced on the PSF blog last year).
  • As part of that work, there will be some major changes to how pip determines what to install, based on package requirements.
  • What will change:
    • It will reduce inconsistency: it will no longer install a combination of packages that is mutually inconsistent.
    • It will be stricter - if you ask pip to install two packages with incompatible requirements, it will refuse (rather than installing a broken combination, like it does now).
  • What you can do to help
    • First and most fundamentally, please help us understand how you use pip by talking with our user experience researchers.
    • Even before we release the new resolver as a beta, you can help by running **pip check** on your current environment.
    • Please make time to test the new version of pip, probably in May.
    • Spread the word!
    • And if you develop or support a tool that wraps pip or uses it to deliver part of your functionality, please make time to test your integration with our beta in May

Matt #4: Covid-19 Data

  • Think global act local
  • Problem - No local data
  • Made my own plots - current status no predictions
  • ML works ok for basic model
  • Implementing SIR Model with ordinary differential equations scipy odeint function

Brian #5: Why does all() return True if the iterable is empty?

  • Carl Johnson
  • Q: “Why does all() return True if the iterable is empty? Shouldn’t it return False just like if my_list:would evaluate to False if the list is empty? What’s the thinking behind it returning True?”
  • Lesson 1: "… basically doesn’t matter. The Python core team chose to make all([])return True, and whatever their reasons, you can program your way around by adding wrapper functions or if tests. ”
  • Lesson 2: “all unicorns are blue”
  • Lesson 3: “This is literally a 2,500 year old debate in philosophy. The ancients thought “all unicorns are blue” should be false because there are no unicorns, but modern logic says it is true because there are no unicorns that aren’t blue. Python is just siding with modern predicate logic, but your intuition is also quite common and was the orthodox position until the last few hundred years.”
  • Blog post goes into teaching about predicate logic, Socrates, Aristotelean syllogisms, and such.
  • And, really, no answer to why. But now, I’ll never forget that all([]) == True.

Michael #6: pytest-monitor

  • written by Jean-Sébastien Dieu
  • pytest plugin for analyzing resource usage during test sessions
  • Analyze your resources consumption through test functions:
    • memory consumption
    • time duration
    • CPU usage
  • Keep a history of your resource consumption measurements.
  • Compare how your code behaves between different environments.
  • Usage: Simply run pytest as usual: pytest-monitor is active by default as soon as it is installed.
  • After running your first session, a .pymon sqlite database will be accessible in the directory where pytest was run.
  • You will need a valid Python 3.5+ interpreter. To get measures, we rely on:
    • psutil to extract CPU usage
    • memory_profiler to collect memory usage
    • and pytest (obviously!)

Extras:

Michael:

  • switchlang is now on pypi : pip install switchlang
  • markdown-subtemplate is now on pypi: pip install markdown-subtemplate

Joke:

Light timer fix: https://twitter.com/Sarcastic_Pharm/status/1238060786658009089


Audio Download

Posted on 1 April 2020 | 8:00 am


#174 Happy developers use Python 3

Sponsored by us! Talk Python courses & pytest book.

Topic #0: Quick chat about COVID 19.

Brian #1: Documentation as a way to build Community

  • Melissa Mendonça
  • “… educational materials can have a huge impact and effectively bring people into the community.”
  • Quality documentation for OSS is often lacking due to:
    • decentralized development
    • documentation is not as glamorous or as praised as new features or major bug fixes
    • “Even when the community is welcoming, documentation is often seen as a "good first issue", meaning that the docs end up being written by the least experienced contributors in the community.”
  • Possible solution:
    • organize/re-organize docs into:
      • tutorials
      • how-tos
      • reference guide
      • explanations
    • consequences:
      • Improving on the quality and discoverability
      • Clear difference between docs aimed at different users
      • Give users more opportunities to contribute, generating content that can be shared directly on the official documentation
      • Building a documentation team as a first-class team in the project, which helps create an explicit role as documentation creator. This helps people better identify how they can contribute beyond code.
      • Diversifying our contributor base, allowing people from different levels of expertise and different life experiences to contribute. This is also extremely important so that we have a better understanding of our community and can be accessible, unbiased and welcoming to all people.
  • Referenced in article: "What nobody tells you about documentation"

Michael #2: The Django Speed Handbook: making a Django app faster

  • By Shibel Mansour
  • Speed of your app is very important: 100ms is an eternity. SEO, user conversions, bounce rates, etc.
  • Use the tried-and-true django-debug-toolbar.
    • Analyze your request/response cycles and see where most of the time is spent.
    • Provides database query execution times and provides a nice SQL EXPLAIN in a separate pane that appears in the browser.
  • ORM/Database: Two ORM functionalities I want to mention first: these are select_related and prefetch_related. Nice 24x perf improvement example in the article. Basically, beware of the N+1 problem.
  • Indexes: Be sure to add them but they slow writes.
  • Pagination: Use it if you have lots of data
  • Async / background tasks.
  • Content size: Shrunk 9x by adding gzip middleware
  • Static files: minify and bundle as you can, cache, serve through nginx, etc.
    • At Python Bytes, Talk Python, etc, we use webassets, cssmin, and jsmin.
  • PageSpeed from Google, talk python’s ranking.
  • ImageOptim (for macOS, others)
  • Lazy-loading images: Lazily loading images means that we only request them when or a little before they enter the client’s (user’s) viewport. With excellent, dependency-free JavaScript libraries like LazyLoad, there really isn’t an excuse to not lazy-load images. Moreover, Google Chrome natively supports the lazy attribute.
  • Remember: Test and measure everything, before and after.

Brian #3: dacite: simplifies creation of data classes from dictionaries

  • Konrad Hałas
  • dataclasses are awesome
    • quick and easy
    • fields can
      • have default values
      • be excluded from comparison and/or repr and more
  • data often gets to us in dictionaries
  • Converting from dict to dataclass is trivial for trivial cases: x = MyClass(**data_as_dict)
  • For more complicated conversions, you need dacite
  • dacite.from_dict supports:
    • nested structures
    • optional fields and unions
    • collections
    • type_hooks, which allow you to have custom converters for certain types
  • strict mode. Normally allows extra input data that is just ignored if it doesn’t match up with fields. But you can use strict to not allow that.
  • Raises exceptions when something weird happens, like the wrong type, missing values, etc.

Michael #4: How we retired Python 2 and improved developer happiness

  • By Barry Warsaw
  • The Python Clock is at 0:00.
  • In 2018, LinkedIn embarked on a multi-quarter effort to fully transition to a Python 3 code base.
  • In total, the effort entailed the migration of about 550 code repositories.
  • They don't use Python in our product or as a monolithic web service, and instead have hundreds of independent microservices and tools, and dozens of supporting libraries, all owned by independent teams in separate repositories.
  • In the early days, most of internal libraries were ported to be “bilingual,” meaning they could be used in either Python 2 or 3.
  • Given that the migration affected all of LinkedIn engineering across so many disparate teams and thousands of engineers, the effort was overseen by our Horizontal Initiatives (HI) program.
  • Phase 1: In the first quarter of 2019, we performed detailed dependency graphing, identifying a number of repositories that were more foundational, and thus needed to be fully ported first because they blocked the ports of everything that depended on them.
  • Phase 2: In the second quarter of 2019, we identified the remainder of repositories that needed porting
  • Post-migration reflections: Our primary indicator for completing the migration of a multiproduct was that it built successfully and passed its unit and integration tests.
  • For other organizations planning or in the midst of their own migration paths, we offer the following guidelines:
    • Plan early, and engage your organization’s Python experts. Find and leverage champions in your affected teams, and promote the benefits of Python 3.
    • Adopt the bilingual approach to supporting libraries so that consumers of your libraries can port to Python 3 on their own schedules.
    • Invest in tests and code coverage—these will be your best success metrics.
    • Ensure that your data models are explicit and clear, especially in identifying which data are bytes and which are human-readable text.
  • Benefits:
    • No longer have to worry about supporting Python 2 and have seen our support loads decrease.
    • Can now depend on the latest open source libraries and tools, and free from the constrictions of having to write bilingual Python.
    • Opportunistically and enthusiastically adopting type hinting and the mypy type checker, improving the overall quality, craft, and readability of Python code bases.

Brian #5: The Troublesome Active Record Pattern

  • Cal Paterson
  • "Object relational mappers" (ORMs) exist to bridge the gap between the programmers' friend (the object), and the database's primitive (the relation).
  • Examples include Django ORM and SQLAlchemy
  • The Active Record pattern of data access is marked by:
    1. A whole-object basis
    2. Access by key (mostly primary key)
  • Problem: Queries that don’t need all information for objects retrieve it all anyway, and it’s easy to code for loops to select or collect info that are wildly inefficient.
    • how many books are there
    • how many books about software testing written by Oregon authors
  • Problem: transactions. people can forget to use transactions, some ORMs don’t support them, they are not taught in beginner tutorials, etc.
    • SQLAlchemy has sessions
    • Django has atomic()
  • REST APIs can suffer the same problems.
  • Solutions:
    • just use SQL
    • first class queries
    • first class transactions
    • avoid Active Record style access patterns
    • Be careful with REST APIs
      • Alternatives:
        • GraphQL
        • RPC-style APIs

Michael #6: Types at the edges in Python

  • By Steve Brazier
  • For a new web service in python there are 3 things to start with:
  • Why: Because what is this about? AttributeError: 'NoneType' object has no attribute 'strip' It should be: none is not an allowed value (type=type_error.none.not_allowed)
  • We then launch this code into production and our assumptions are tested against reality. If we’re lucky our assumptions turn out to be correct. If not we likely encounter some cryptic NoneType errors like the one at the start of this post.
  • Pydantic can help by formalizing our assumptions.
  • mypy carries on helping: Once you see the error at the start of this post (thanks error reporting) you know what is wrong about assumptions. Make the following change to your code: field: typing.Optional[str]
  • BTW: FastAPI integrates with Pydantic out of the box.
  • A mini-kata like exercise here that can be worked through: meadsteve/types-at-the-edges-minikata

Extras:

Michael:

Joke:


Audio Download

Posted on 26 March 2020 | 8:00 am