Python Bytes

by Michael Kennedy and Brian Okken

Python Bytes is a weekly podcast hosted by Michael Kennedy and Brian Okken. The show is a short discussion on the headlines and noteworthy news in the Python, developer, and data science space.

  

Latest Episodes

#169 Jupyter Notebooks natively on your iPad

Sponsored by Datadog: pythonbytes.fm/datadog

Brian #1: D-Tale

  • suggested by @davidouglasmit via twitter
  • “D-Tale is the combination of a Flask back-end and a React front-end to bring you an easy way to view & analyze Pandas data structures. It integrates seamlessly with ipython notebooks & python/ipython terminals. Currently this tool supports such Pandas objects as DataFrame, Series, MultiIndex, DatetimeIndex & RangeIndex.”
  • way cool UI for visualizing data
  • Live Demo shows
    • Describe shows column statistics, graph, and top 100 values
    • filter, correlations, charts, heat map

Michael #2: Carnets

  • by Nicolas Holzschuch
  • A standalone Jupyter notebooks implementation for iOS.
  • The power of Jupyter notebooks. In your pocket. Anywhere. Everything runs on your device. No need to setup a server, no need for an internet connection.
  • Standard packages like Numpy, Matplotlib, Sympy and Pandas are already installed. You're ready to edit notebooks.
  • Carnets uses iOS 11 filesharing ability. You can store your notebooks in iCloud, access them using other apps, share them.
  • Extended keyboard on iPads, you get an extended toolbar with basic actions on your keyboard.
  • Install more packages: Add more Python packages with %pip (if they are pure Python).
  • OpenSource: Carnets is entirely OpenSource, and released under the FreeBSD license.

Brian #3: BeeWare Podium

  • suggested by Katie McLaughlin, @glasnt on twitter
  • NOT a pip install, download a binary from https://github.com/beeware/podium/releases
  • Linux and macOS
  • Still early, so you gotta do the open and trust from the apps directory thing for running stuff not from the app store. But Oh man is it worth it.
  • HTML5 based presentation frameworks are cool. run a presentation right in your browser. My favorite has been remark.js
    • presenter mode,
      • notes are especially useful while practicing a talk
      • running timer super helpful while giving a talk
    • write talk in markdown, so it’s super easy to version control
    • issues:
      • presenter mode, full screen, with extended monitor hard to do.
      • notes and timer on laptop, full presentation on extended screen
      • super cool but requires full screening with mouse
  • Podium
    • uses similar syntax as remark.js and I think uses remark under the hood.
    • but it’s a native app, not a browser
    • Handles the presenter mode and extended screen smoothly, like keynote and others.
    • Removes the need for boilerplate html in your markdown file (remark.js md files have cruft).
  • Can’t wait to try this out for my next presentation

Michael #4: pytest-mock-resources

  • via Daniel Cardin
  • pytest fixture factories to make it easier to test against code that depends on external resources like Postgres, Redshift, and MongoDB.
  • Code which depends on external resources such a databases (postgres, redshift, etc) can be difficult to write automated tests for.
  • Conventional wisdom might be to mock or stub out the actual database calls and assert that the code works correctly before/after the calls.
  • Whether the actual query did the correct thing truly requires that you execute the query.
  • Having tests depend upon a real postgres instance running somewhere is a pain, very fragile, and prone to issues across machines and test failures.
  • Therefore pytest-mock-resources (primarily) works by managing the lifecycle of docker containers and providing access to them inside your tests.

Brian #5: How James Bennet is testing in 2020

  • Follow up from Testing Django applications in 2018
  • Favors unittest over pytest.
  • tox for testing over multiple Django and Python versions, including tox-travis plugin
  • pyenv for local Python installation management and pyenv-virtualenv plugin for venvs.
  • Custom runtests.py for setting up environment and running tests.
  • Changed to src/ directory layout.
  • Coverage and reporting failure if coverage dips, with a healthy perspective: “… this isn’t because I have 100% coverage as a goal. Achieving that is so easy in most projects that it’s meaningless as a way to measure quality. Instead, I use the coverage report as a canary. It’s a thing that shouldn’t change, and if it ever does change I want to know, because it will almost always mean something else has gone wrong, and the coverage report will give me some pointers for where to look as I start investigating.”
  • Testing is more than tests, it’s also black, isort, flake8, mypy, and even spell checking sphinx documentation.
  • Using tox.ini for utility scripts, like cleanup, pipupgrade, …

Michael #6: Python and PyQt: Building a GUI Desktop Calculator

  • by by Leodanis Pozo Ramos at realpython
  • Some interesting take-aways:
  • Basics of PyQt
    • Widgets: QWidget is the base class for all user interface objects, or widgets. These are rectangular-shaped graphical components that you can place on your application’s windows to build the GUI.
    • Layout Managers: Layout managers are classes that allow you to size and position your widgets at the places you want them to be on the application’s form.
    • Main Windows: Most of the time, your GUI applications will be Main Window-Style. This means that they’ll have a menu bar, some toolbars, a status bar, and a central widget that will be the GUI’s main element.
    • Applications: The most basic class you’ll use when developing PyQt GUI applications is QApplication. This class is at the core of any PyQt application. It manages the application’s control flow as well as its main settings.
    • Signals and Slots: PyQt widgets act as event-catchers. Widgets always emit a signal, which is a kind of message that announces a change in its state.
  • Due to Qt licensing, you can only use the free version for non-commercial projects or internal non-redistributed or purchase a commercial license for $5,500/yr/dev.

Extras

Brian

  • PyCascades 2020 livestream videos of day 1 & day 2 are available.
    • Huge shout-out and thank you to all of the volunteers for this event.
    • In particular Nina Zakharenko for calming me down before my talk.

Michael

Joke

  • Why do programmers confuse Halloween with Christmas? Because OCT 31 == DEC 25.
  • Speed dating is useless. 5 minutes is not enough to properly explain the benefits of the Unix philosophy.


Audio Download

Posted on 19 February 2020 | 8:00 am


#168 Race your donkey car with Python

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean

Special guest: Kojo Idrissa!

Michael #1: donkeycar

  • Have you ever seen a proper RC car race?
  • Donkeycar is minimalist and modular self driving library for Python.
  • It is developed for hobbyists and students with a focus on allowing fast experimentation and easy community contributions.
  • Use Donkey if you want to:
    • Make an RC car drive its self.
    • Compete in self driving races like DIY Robocars
    • Experiment with autopilots, mapping computer vision and neural networks.
    • Log sensor data (images, user inputs, sensor readings).
    • Drive your car via a web or game controller.
    • Leverage community contributed driving data.
    • Use existing CAD models for design upgrades.

Brian #2: RIP Pipenv: Tried Too Hard. Do what you need with pip-tools.

  • Nick Timkovich
  • No releases of pipenv in 2019. It “has been held back by several subdependencies and a complicated release process”
  • main benefits of pipenv: pin everything and use hashes for verifying packages
    • The two file concept (Pipfile Pipfile.lock) is pretty cool and useful
  • But we can do that with pip-tools command line tool pip-compile, which is also used by pipenv:
    • pip-compile --generate-hashes --ouptut-file requirements.txt requirements.in
  • What about virtual environment support?
    • python -m venv venv --prompt $(basename $PWD) or equivalent for your shell works fine, and it’s built in.

Kojo #3: str.casefold()

  • used for caseless matching
  • “Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string.”
  • especially helpful for Unicode characters
    firstString = "der Fluß"
    secondString = "der Fluss"

    # ß is equivalent to ss
    if firstString.casefold() == secondString.casefold():
        print('The strings are equal.')
    else:
        print('The strings are not equal.')

    # prints "The strings are equal."

Michael #4: Virtualenv

  • via Brian Skinn
  • Virtualenv 20.0.0 beta1 is available
  • Announcement by Bernat Gabor
  • Why the major release
  • I identified three main pain points:
    • Creating a virtual environment is slow (takes around 3 seconds, even in offline mode; while 3 seconds does not seem that long if you need to create tens of virtual environments, it quickly adds up).
    • The API used within PEP-405 is excellent if you want to create virtual environments; however, only that. It does not allow us to describe the target environment flexibly or to do that without actually creating the environment.
    • The duality of virtualenv versus venv. Right, python3.4 has the venv module as defined by PEP-405. In theory, we could switch to that and forget virtualenv. However, it is not that simple. virtualenv offers a few benefits that venv does not
  • Benefits over venv
    • Ability to discover alternate versions (-p 2 creates a python 2 virtual environment, -p 3.8 a python 3.8, -p pypy3 a PyPy 3, and so on).
    • virtualenv packages out of the box the wheel package as part of the seed packages, this significantly improves package installation speed as pip can now use its wheel cache when installing packages.
    • You are guaranteed to work even when distributions decide not to ship venv (Debian derivates notably make venv an extra package, and not part of the core binary).
    • Can be upgraded out of band from the host python (often via just pip/curl - so can pull in bug fixes and improvements without needing to wait until the platform upgrades venv).
    • Easier to extend, e.g., we added Xonsh activation script generation without much pushback, support for PowerShell activation on POSIX platforms.

Brian #5: Property-based tests for the Python standard library (and builtins)

  • Zac Hatfield-Dodds and Paul Ganssle, so far.
  • Goal: Find and fix bugs in Python, before they ship to users.
  • “CPython's existing test suite is good, but bugs still slip through occasionally. We think that using property-based testing tools - i.e. Hypothesis - can help with this. They're no magic bullet, but computer-assisted testing techniques routinely try inputs that humans wouldn't think of (or bother trying), and turn up bugs that humans missed.”
  • “Writing tests that describe every valid input often leads to tighter validation and cleaner designs too, even when no counterexamples are found!”
  • “We aim to have a compelling proof-of-concept by PyCon US, and be running as part of the CPython CI suite by the end of the sprints.”
  • Hypothesis and property based testing is superb to throw at algorithmic pure functions, and the test criteria is relatively straightforward for function pairs that have round trip logic, like tokenize/untokenize, encode/decode, compress/decompress, etc. And there’s probably tons of those types of methods in Python.
  • At the very least, I’m interested in this to watch how other people are using hypothesis.

Kojo #6: PyCon US Tutorial Schedule & Registration

  • Find the schedule at https://us.pycon.org/2020/schedule/tutorials/
  • They tend to sell out FAST
  • Videos are up fast afterwards
  • What’s interesting to me?
    • Migration from Python 2 to 3
    • Welcome to Circuit Python (Kattni Rembor)
    • Intro to Property-Based Testing
    • Minimum Viable Documentation (Heidi Waterhouse)

Extras

Michael:

Joke

See the cartoon:

https://trello-attachments.s3.amazonaws.com/58e3f7c543422d7f3ad84f33/5df14f77efb5642d017a593f/31cba5cdf0e9805d47837916555dd7ab/b5cb6570af72883f06c3dcbf47679e9d.jpg


Audio Download

Posted on 11 February 2020 | 8:00 am


#167 Cheating at Kaggle and uWSGI in prod

Sponsored by Datadog: pythonbytes.fm/datadog

Special guest: Vicki Boykis: @vboykis

Michael #1: clize: Turn functions into command-line interfaces

  • via Marcelo
  • Follow up from Typer on episode 164.
  • Features
    • Create command-line interfaces by creating functions and passing them to [clize.run](https://clize.readthedocs.io/en/stable/api.html#clize.run).
    • Enjoy a CLI automatically created from your functions’ parameters.
    • Bring your users familiar --help messages generated from your docstrings.
    • Reuse functionality across multiple commands using decorators.
    • Extend Clize with new parameter behavior.
  • I love how this is pure Python without its own API for the default case

Vicki #2: How to cheat at Kaggle AI contests

  • Kaggle is a platform, now owned by Google, that allows data scientists to find data sets, learn data science, and participate in competitions
  • Many people participate in Kaggle competitions to sharpen their data science/modeling skills
  • Recently, a competition that was related to analyzing pet shelter data resulted in a huge controversy
  • Petfinder.my is a platform that helps people find pets to rescue in Malaysia from shelters. In 2019, they announced a collaboration with Kaggle to create a machine learning predictor algorithm of which pets (worldwide) were more likely to be adopted based on the metadata of the descriptions on the site.
  • The total prize offered was $25,000
  • After several months, a contestant won. He was previously a Kaggle grandmaster, and won $10k.
  • A volunteer, Benjamin Minixhofer, offered to put the algorithm in production, and when he did, he found that there was a huge discrepancy between first and second place
  • Technical Aspects of the controversy:
    • The data they gave asked the contestants to predict the speed at which a pet would be adopted, from 1-5, and included input features like type of animal, breed, coloration, whether the animal was vaccinated, and adoption fee
    • The initial training set had 15k animals and the teams, after a couple months, were then given 4k animals that their algorithms had not seen before as a test of how accurate they were (common machine learning best practice).
    • In a Jupyter notebook Kernel on Kaggle, Minixhofer explains how the winning team cheated
    • First, they individually scraped Petfinder.my to find the answers for the 4k test data
    • Using md5, they created a hash for each unique pet, and looked up the score for each hash from the external dataset - there were 3500 overlaps
    • Did Pandas column manipulation to get at the hidden prediction variable for every 10th pet and replaces the prediction that should have been generated by the algorithm with the actual value
    • Using mostly: obfuscated functions, Pandas, and dictionaries, as well as MD5 hashes
  • Fallout:

Michael #3: Configuring uWSGI for Production Deployment

  • We run a lot of uWSGI backed services. I’ve spoken in-depth back on Talk Python 215: The software powering Talk Python courses and podcast about this.
  • This is guidance from Bloomberg Engineering’s Structured Products Applications group
  • We chose uWSGI as our host because of its performance and feature set. But, while powerful, uWSGI’s defaults are driven by backward compatibility and are not ideal for new deployments.
  • There is also an official Things to Know doc.
  • Unbit, the developer of uWSGI, has “decided to fix all of the bad defaults (especially for the Python plugin) in the 2.1 branch.” The 2.1 branch is not released yet.
  • Warning, I had trouble with die-on-term and systemctl
  • Settings I’m using:
# This option tells uWSGI to fail to start if any parameter
# in the configuration file isn’t explicitly understood by uWSGI.
strict = true

# The master uWSGI process is necessary to gracefully re-spawn
# and pre-fork workers, consolidate logs, and manage many other features
master = true

# uWSGI disables Python threads by default, as described in the Things to Know doc.
enable-threads = true

# This option will instruct uWSGI to clean up any temporary files or UNIX sockets it created
vacuum = true

# By default, uWSGI starts in multiple interpreter mode
single-interpreter = true

# Prevents uWSGI from starting if it is unable to find or load your application module
need-app = true

# uWSGI provides some functionality which can help identify the workers
auto-procname = true
procname-prefix = pythonbytes-

# Forcefully kill workers after 60 seconds. Without this feature,
# a stuck process could stay stuck forever.
harakiri = 60
harakiri-verbose = true

Vicki #4: Thinc: A functional take on deep learning, compatible with Tensorflow, PyTorch, and MXNet

  • A deep learning library that abstracts away some TF and Pytorch boilerplate, from Explosion
  • Already runs under the covers in SpaCy, an NLP library used for deep learning
  • type checking, particularly helpful for Tensors: PyTorchWrapper and TensorFlowWrapper classes and the intermingling of both
  • Deep support for numpy structures and semantics
  • Assumes you’re going to be using stochastic gradient descent
  • And operates in batches
  • Also cleans up the configuration and hyperparameters
  • Mainly hopes to make it easier and more flexible to do matrix manipulations, using a codebase that already existed but was not customer-facing.
  • Examples and code are all available in notebooks in the GitHub repo

Michael #5: pandas-vet

  • via Jacob Deppen
  • A plugin for Flake8 that checks pandas code
  • Starting with pandas can be daunting.
  • The usual internet help sites are littered with different ways to do the same thing and some features that the pandas docs themselves discourage live on in the API.
  • Makes pandas a little more friendly for newcomers by taking some opinionated stances about pandas best practices.
  • The idea to create a linter was sparked by Ania Kapuścińska's talk at PyCascades 2019, "Lint your code responsibly!"

Vicki #6: NumPy beginner documentation

  • NumPy is the backbone of numerical computing in Python: Pandas (which I mentioned before), scikit-learn, Tensorflow, and Pytorch, all lean heavily if not directly depend on its core concepts, which include matrix operations through a data structure known as a NumPy array (which is different than a Python list) - ndarray
  • Anne Bonner wrote up new documentation for NumPy that introduces these fundamental concepts to beginners coming to both Python and scientific computing
  • Before, you went directly to the section about arrays and had to search through it find what you wanted. The new guide, which is very nice, includes a step-by-step on how arrays work, how to reshape them, and illustrated guides on basic array operations.

Extras:

Vicki

  • I write a newsletter, Normcore Tech, about all things tech that I’m not seeing covered in the mainstream tech media. I’ve written before about machine learning, data for NLP, Elon Musk memes, and Nginx.
  • There’s a free version that goes out once a week and paid subscribers get access to one more newsletter per week, but really it’s more about the idea of supporting in-depth writing about tech. vicki.substack.com

Michael:

  • pip 20.0 Released - Default to doing a user install (as if --user was passed) when the main site-packages directory is not writeable and user site-packages are enabled, cache wheels built from Git requirements, and more.
  • Homebrew: brew install python@3.8

Joke:

An SEO expert walks into a bar, bars, pub, public house, Irish pub, tavern, bartender, beer, liquor, wine, alcohol, spirits...


Audio Download

Posted on 3 February 2020 | 8:00 am


#166 Misunderstanding software clocks and time

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean

Michael #1: Amazon is now offering quantum computing as a service

  • Amazon Braket – A fully managed service that allows scientists, researchers, and developers to begin experimenting with computers from multiple quantum hardware providers in a single place.
  • We all know about bits. Quantum computers use a more sophisticated data representation known as a qubit or quantum bit. Each qubit can exist in state 1 or 0, but also in superpositions of 1 and 0, meaning that the qubit simultaneously occupies both states. Such states can be specified by a two-dimensional vector that contains a pair of complex numbers, making for an infinite number of states. Each of the complex numbers is a probability amplitude, basically the odds that the qubit is a 0 or a 1, respectively.
  • Amazon Braket is a new service designed to let you get some hands-on experience with qubits and quantum circuits. You can build and test your circuits in a simulated environment and then run them on an actual quantum computer.
  • See linked announcement. Language looks familiar:
    [1]:
    bell = Circuit().h(0).cnot(0, 1)
    print(device.run(bell, s3_folder).result().measurement_counts())
  • How it Works: Quantum computers work by manipulating the amplitudes of the state vector. To program a quantum computer, you figure out how many qubits you need, wire them together into a quantum circuit, and run the circuit. When you build the circuit, you set it up so that the correct answer is the most probable one, and all the rest are highly improbable.

Brian #2: A quick-and-dirty guide on how to install packages for Python

  • Brett Cannon
  • Good modern intro to venv use.
  • Pro
    • short. simple. quick
    • uses --prompt in every example (more people need to use this)
      • and suggests using the directory name containing the env.
    • send it to all your co-workers that STILL aren’t using virtual environments
    • hints at an improved form of --prompt coming in Python 3.9
  • Con
    • uses .venv, I’m a venv (no dot kinda guy)
    • hints at an improved form of --prompt coming in Python 3.9
      • --prompt . will deduce the directory name. In 3.8 it just names your env “.”.

Michael #3: Say No to the no code movement

  • Article by Alex Hudson
  • 2020 is going to be the year of “no code”: the movement that say you can write business logic and even entire applications without having the training of a software developer.
  • Every company is a software company
  • But software devs are in short supply and outcomes are variable
  • two distinct benefits to transitioning business processes into the software domain
    • “change control” becomes a software problem rather than a people problem.
    • it’s easier to innovate on what makes a business distinct.
  • The basic problem with “no code”
  • the idea of writing business logic in text form according to the syntax of a technical programming language is anathema.
  • The “simpler abstraction” misconception
  • The “simpler syntax” misconception
  • Configuration over code: Many No Code advocates are building significant systems by pulling together off-the-shelf applications and integrating them. But the logic has been implemented as configuration as opposed to code.
  • The equivalence of code: There are reasons why developers still use plain text, if something came along that was better, many (not all!) developers would drop text like a hot rock.
  • Where does “No code” fail in practice? 80% there and then …
  • Where does “No code” succeed? “No Code” systems are extremely good for putting together proofs-of-concept which can demonstrate the value of moving forward with development.

Brian #4: What I learned going from prison to Python

  • Shadeed “Sha” Wallace-Stepter
  • Presented at North Bay Python
  • I got this recommended to be by many people, even those not in the Python community, including my good friends Chuck Forbes and Dr. Donna Beegle, who work to fight poverty.
  • Amazing story. Go listen to it.

Michael #5: A real QUICK → Qt5 based gUI generator for ClicK

  • Via Ricky Teachey.
  • Inspired by Gooey, the GUI generator for classical Python argparse-based command line programs.
  • Take a standard Click-based app, add --gui to the command line and you get a GUI!

Brian #6: Falsehoods programmers believe about time

All of these assumptions are wrong

  1. There are always 24 hours in a day.
  2. Months have either 30 or 31 days.

  1. A week always begins and ends in the same month.

  1. The system clock will always be set to the correct local time
  2. The system clock will always be set to a time that is not wildly different from the correct local time.
  3. If the system clock is incorrect, it will at least always be off by a consistent number of seconds.

  1. It will never be necessary to set the system time to any value other than the correct local time.
  2. Ok, testing might require setting the system time to a value other than the correct local time but it will never be necessary to do so in production.

  1. Human-readable dates can be specified in universally understood formats such as 05/07/11.

… from more …

  1. The day before Saturday is always Friday.

  1. Two subsequent calls to a getCurrentTime() function will return distinct results.
  2. The second of two subsequent calls to a getCurrentTime() function will return a larger result.
  3. The software will never run on a space ship that is orbiting a black hole.

Extras

Michael:

Joke

https://twitter.com/mbbillz/status/921119218703257600


Audio Download

Posted on 27 January 2020 | 8:00 am


#165 Ranges as dictionary keys - oh my!

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean

Brian #1: iterators, generators, coroutines

  • Cool quick read article by Mark McDonnell.
  • Starts with an attempt at a gentle introduction to the iterator protocol (why does everyone think that users need to start with this info?) Muscle through this part or just skim it. Should be an appendix.
  • Generators (start here): functions that use yield
  • Unbound generators: they don’t stop
  • Generator Expressions: Like for v in ("foo" for i in range(5)): …
    • Use parens instead of brackets, otherwise they are like list comprehensions.
    • Specifically: (expression for item in collection if condition)
  • Generators using generators / nested generators : yield from
  • Given bar() and baz() are generators, this works:
    def foo():
        yield from bar()
        yield from baz()
  • Coroutines are an extension of generators
    • “Generators use the yield keyword to return a value at some point in time within a function, but with coroutines the yield directive can also be used on the right-hand side of an = operator to signify it will accept a value at that point in time.”
  • Then….. coroutine example, some asyncio stuff, … honestly I got lost.
  • Bottom line:
    • I’m still looking for a great tutorial on coroutines that
      • doesn’t explain the iterator protocol (boring!)
      • shows an example NOT using asyncio and NOT a REPL example
    • I want to know how I can make use of coroutines in an actual program (toy ok) where the use of coroutines actually helps the structure and makes it more maintainable, etc.

Michael #2: requests-toolbelt

  • A toolbelt of useful classes and functions to be used with requests
  • multipart/form-data encoder - The main attraction is a streaming multipart form-data object, MultipartEncoder.
  • User-Agent constructor - You can easily construct a requests-style User-Agent string
  • SSLAdapter - Allows the user to choose one of the SSL protocols made available in Python's ssl module for outgoing HTTPS connections
  • ForgetfulCookieJar - prevents a particular requests session from storing cookies

Brian #3: Pandas Validation

  • We covered Bulwark in episode 162
  • There are other approaches and projects looking at the same problem.
  • pandas-validation
    • Suggested by Lance
    • “… pandas-validation lets you create a template of what your pandas dataframe should look like and it'll validate the entire dataframe against that template. So if you have a dataframe with first column being strings second column being dates and the third being address, you can use a mixture of built in validate types to ensure your data conforms to that. It will even let you set up some regex and make sure that the data in a column conforms to that regex.” - Lance
    • supports dates, timestamps, numeric values, strings
  • pandera
    • “pandera provides a flexible and expressive API for performing data validation on tidy (long-form) and wide data to make data processing pipelines more readable and robust."
    • “pandas data structures contain information that pandera explicitly validates at runtime. This is useful in production-critical or reproducible research settings.
    • “pandera enables users to:
      • Check the types and properties of columns in a DataFrame or values in a Series.
      • Perform more complex statistical validation.
      • Seamlessly integrate with existing data analysis/processing pipelines via function decorators.”
  • A few different approaches. I can’t really tell from the outside if there is a clear winner or solution that’s working better for most cases. I’d like to hear from listeners which they use, if any. Or if we missed the obvious validation method most people are using.

Michael #4: qtpy

  • I have been inspired to check out Qt again, but the libraries and versions a confusing.
  • Provides an uniform layer to support PyQt5, PySide2, PyQt4 and PySide with a single codebase
  • Basically, you can write your code as if you were using PySide2 but import Qt modules from qtpy instead of PySide2 (or PyQt5).

Brian #5: pylightxl

  • Viktor Kis submission
  • “A light weight, zero dependency, minimal functionality excel read/writer python library”
  • Well. Reader right now. Writing coming soon. :)
  • Some cool examples in the docs to get you started grabbing data from spreadsheets right away.
  • Features:
    • Zero non-standard library dependencies
    • Single source code that supports both Python37 and Python27. The light weight library is only 3 source files that can be easily copied directly into a project for those that have installation/download restrictions. In addition the library’s size and zero dependency makes pyinstaller compilation small and easy!
    • 100% test-driven development for highest reliability/maintainability with 100% coverage on all supported versions
    • API aimed to be user friendly, intuitive and to the point with no bells and whistles. Structure: database > worksheet > indexing
      • example: db.ws('Sheet1').index(row=1,col=2) or db.ws('Sheet1').address(address='B1')
    • Read excel files (.xlsx, .xlsm), all sheets or selective few for speed/memory management
    • Index cells data by row/col number or address
    • Calling an entire row/col of data returns an easy to use list output:
      • db.ws('Sheet1').row(1) or db.ws('Sheet1').rows
    • Worksheet data size is consistent for each row/col. Any data that is empty will return a ‘’

Michael #6: python-ranges

  • via Aiden Price
  • Continuous Range, RangeSet, and RangeDict data structures for Python
  • Best understood as an example:
    tax_info = RangeDict({
        Range(0, 9701):        (0,        0.10, 0),
        Range(9701,   39476):  (970,      0.12, 9700), 
        ... })

    income = int(input("What is your income? $"))
    base, marginal_rate, bracket_floor = tax_info[income]
    • Range and RangeSet objects are mutually compatible for things like union(), intersection(), difference(), and symmetric_difference()

Extras:

  • Brian:
  • Michael:
    • Pandas goes 1.0 (via Jeremy Schendel). Just put out a release candidate for 1.0, and will be using SemVer going forward.
    • PyCharm security from Anthony Shaw.
    • Video for Python for Decision Makers webcast is out.

Joke:

  • Optimist: The glass is half full.
  • Pessimist: The glass is half empty.
  • Engineer: The glass is twice as large as it needs to be.


Audio Download

Posted on 21 January 2020 | 8:00 am


#164 Use type hints to build your next CLI app

Sponsored by Datadog: pythonbytes.fm/datadog

Michael #1: Data driven journalism via cjworkbench

  • via Michael Paholski
  • The data journalism platform with built in training
  • Think spreadsheet + ETL automation
  • Designed around modular tools for data processing -- table in, table out -- with no code required
  • Features include:
    • Modules to scrape, clean, analyze and visualize data
    • An integrated data journalism training program
    • Connect to Google Drive, Twitter, and API endpoints.
    • Every action is recorded, so all workflows are repeatable and transparent
    • All data is live and versioned, and you can monitor for changes.
    • Write custom modules in Python and add them to the module library

Brian #2: remi: A Platform-independent Python GUI library for your applications.

  • Python REMote Interface library.
  • “Remi is a GUI library for Python applications which transpiles an application's interface into HTML to be rendered in a web browser. This removes platform-specific dependencies and lets you easily develop cross-platform applications in Python!”
  • No dependencies. pip install git+https://github.com/dddomodossola/remi.git doesn’t install anything else.
  • Yes. Another GUI in a web page, but for quick and dirty internal tools, this will be very usable.
  • Basic app:
    import remi.gui as gui
    from remi import start, App

    class MyApp(App):
        def __init__(self, *args):
            super(MyApp, self).__init__(*args)

        def main(self):
            container = gui.VBox(width=120, height=100)
            self.lbl = gui.Label('Hello world!')
            self.bt = gui.Button('Press me!')
            self.bt.onclick.do(self.on_button_pressed)
            container.append(self.lbl)
            container.append(self.bt)
            return container

        def on_button_pressed(self, widget):
            self.lbl.set_text('Button pressed!')
            self.bt.set_text('Hi!')

    start(MyApp)

Michael #3: Typer

  • Build great CLIs. Easy to code.
  • Based on Python type hints.
  • Typer is FastAPI's little sibling. And it's intended to be the FastAPI of CLIs.
  • Just declare once the types of parameters (arguments and options) as function parameters.
  • You do that with standard modern Python types.
  • You don't have to learn a new syntax, the methods or classes of a specific library, etc.
  • Based on Click
  • Example (min version)
    import typer

    def main(name: str):
        typer.echo(f"Hello {name}")

    if __name__ == "__main__":
        typer.run(main)

Brian #4: Effectively using Matplotlib

  • Chris Moffitt
  • “… I think I was a little premature in dismissing matplotlib. To be honest, I did not quite understand it and how to use it effectively in my workflow.”
  • That very much sums up my relationship with matplotlib. But I’m ready to take another serious look at it.
  • one reason for complexity is 2 interfaces
    • MATLAB like state-based interface
    • object based interface (use this)
  • recommendations:
    • Learn the basic matplotlib terminology, specifically what is a Figure and an Axes .
    • Always use the object-oriented interface. Get in the habit of using it from the start of your analysis.
    • Start your visualizations with basic pandas plotting.
    • Use seaborn for the more complex statistical visualizations.
    • Use matplotlib to customize the pandas or seaborn visualization.
  • Runs through an example
  • Describes figures and plots
  • Includes a handy reference for customizing a plot.
  • Related: StackOverflow answer that shows how to generate and embed a matplotlib image into a flask app without saving it to a file.
  • Style it with pylustrator.readthedocs.io :)

Michael #5: Django Simple Task

  • django-simple-task runs background tasks in Django 3 without requiring other services and workers.
  • It runs them in the same event loop as your ASGI application.
  • Here’s a simple overview of how it works:
    1. On application start, a queue is created and a number of workers starts to listen to the queue
    2. When defer is called, a task(function or coroutine function) is added to the queue
    3. When a worker gets a task, it runs it or delegates it to a threadpool
    4. On application shutdown, it waits for tasks to finish before exiting ASGI server
  • It is required to run Django with ASGI server.
  • Example
    from django_simple_task import defer

    def task1():
        time.sleep(1)
        print("task1 done")

    async def task2():
        await asyncio.sleep(1)
        print("task2 done")

    def view(requests):
        defer(task1)
        defer(task2)
        return HttpResponse(b"My View")

Brian #6: PyPI Stats at pypistats.org

  • Simple interface. Pop in a package name and get the download stats.
  • Example use: Why is my open source project now getting PRs and issues?
  • I’ve got a few packages on PyPI, not updated much.
    • cards and submark are mostly for demo purposes for teaching testing.
    • pytest-check is a pytest plugin that allows multiple failures per test.
  • I only hear about issues and PRs on one of these. So let’s look at traffic.
    • cards: downloads day: 2 week: 24 month: 339
    • submark: day: 5 week: 9 month: 61
    • pytest-check: day: 976 week: 4,524 month: 19,636
  • That totally explains why I need to start actually supporting pytest-check. Cool.
  • Note: it’s still small.

Extras:

  • Comment from January Python PDX West meetup
    • “Please remember to have one beginner friendly talk per meetup.”
    • Good point.
    • Even if you can’t present here in Portland / Hillsboro, or don’t want to, I’d love to hear feedback of good beginner friendly topics that are good for meetups.
  • PyCascades 2020

    • discount code listeners-at-pycascades for 10% off
  • FireFox 72 is out with anti-fingerprinting and PIP - Ars Technica

Joke:

Language essays comic


Audio Download

Posted on 16 January 2020 | 8:00 am


#163 Meditations on the Zen of Python

Sponsored by us! Support us by visiting pythonbytes.fm/biz [courses] and pythonbytes.fm/pytest [book], or becoming a patron at patreon.com/pythonbytes

Brian #1: Meditations on the Zen of Python

  • Moshe Zadka
  • The Zen of Python is not "the rules of Python" or "guidelines of Python". It is full of contradiction and allusion. It is not intended to be followed: it is intended to be meditated upon.
  • Moshe give some of his thoughts on the different lines of the Zen of Python.
  • Full Zen of Python can be found here or in a REPL with import this
  • A few
    • Beautiful is better than ugly
      • Consistency helps. So black, flake8, pylint are useful.
      • “But even more important, only humans can judge what humans find beautiful. Code reviews and a collaborative approach to writing code are the only realistic way to build beautiful code. Listening to other people is an important skill in software development.”
    • Complex is better than complicated.
      • “When solving a hard problem, it is often the case that no simple solution will do. In that case, the most Pythonic strategy is to go "bottom-up." Build simple tools and combine them to solve the problem.”
    • Readability counts
      • “In the face of immense pressure to throw readability to the side and just "solve the problem," the Zen of Python reminds us: readability counts. Writing the code so it can be read is a form of compassion for yourself and others.”

Michael #2: nginx raided by Russian police

  • Russian police have raided today the Moscow offices of NGINX, Inc., a subsidiary of F5 Networks and the company behind the internet's most popular web server technology.
  • Russian search engine Rambler.ru claims full ownership of NGINX code.
  • Rambler claims that Igor Sysoev developed NGINX while he was working as a system administrator for the company, hence they are the rightful owner of the project.
  • Sysoev never denied creating NGINX while working at Rambler. In a 2012 interview, Sysoev claimed he developed NGINX in his free time and that Rambler wasn't even aware of it for years.
  • Update
  • Promptly following the event we took measures to ensure the security of our master software builds for NGINX, NGINX Plus, NGINX WAF and NGINX Unit—all of which are stored on servers outside of Russia. No other products are developed within Russia. F5 remains committed to innovating with NGINX, NGINX Plus, NGINX WAF and NGINX Unit, and we will continue to provide the best-in-class support you’ve come to expect.

Brian #3: I'm not feeling the async pressure

  • Armin Ronacher
  • “Async is all the rage.” But before you go there, make sure you understand flow control and back pressure.
  • “…back pressure is resistance that opposes the flow of data through a system. Back pressure sounds quite negative … but it's here to save your day.”
  • If parts of your system are async, you have to make sure the entire flow throw the system doesn’t have overflow points.
  • An example shown with reader/writer that is way hairier than you’d think it should be.
  • “New Footguns: async/await is great but it encourages writing stuff that will behave catastrophically when overloaded.”
  • “So for you developers of async libraries here is a new year's resolution for you: give back pressure and flow control the importance they deserve in documentation and API.”

Michael #4: codetiming from Real Python

Brian #5: Making Python Programs Blazingly Fast

  • Martin Heinz
    • Seemed like a good followup to the last topic
  • Profiling with
    • command line time python something.py
    • python -m cProfile -s time something.py
    • timing functions with wrapper
    • Misses timeit, but see that also, https://docs.python.org/3.8/library/timeit.html
  • How to make things faster:
    • use built in types over custom types
    • caching/memoization with lru_cache
    • use local variables and local aliases when looping
    • use functions… (kinda duh, but sure).
    • don’t repeatedly access attributes in loops
    • use f-strings over other formatting
    • use generators. or at least experiment with them.
      • the memory savings could result in speedup

Michael #6: LocalStack

  • via Graham Williamson and Jan 'oglop' Gazda
  • A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!
  • LocalStack spins up the following core Cloud APIs on your local machine:
  • LocalStack builds on existing best-of-breed mocking/testing tools, most notably kinesalite/dynalite and moto. While these tools are awesome (!), they lack functionality for certain use cases. LocalStack combines the tools, makes them interoperable, and adds important missing functionality on top of them
  • Has lots of config and knobs, but runs in docker so that helps

Extras:

Michael:

Joke: Types of software jobs.


Audio Download

Posted on 9 January 2020 | 8:00 am


#162 Retrofitting async and await into Django

Sponsored by DataDog: pythonbytes.fm/datadog

Special guest: Aly

Aly #1: Andrew Godwin - Just Add Await: Retrofitting Async into Django — DjangoCon 2019

  • Andrew is leading the implementation of asynchronous support for Django
  • Overview of Async Landscape
    • How synchronous and asynchronous code interact
    • Async functions are different than sync functions which makes it hard to design APIs
  • Difficulties in adding Async support to Django
    • Django is a project that a lot of people are familiar with; it’s new async implementation also needs to feel familiar
  • Plan was Implement async capabilities in three phases
  • Phase 1: ASGI Support (Django 3.0)
    • This phase lays the groundwork for future changes
    • ORM is async-aware: using it from async code raises a SynchronousOnlyOperation exception
  • Phase 2: Async Views, Async Handlers, and Async Middleware (Django 3.1)
    • Add async capabilities for the core part of the request path
    • There is a branch where things are mostly working, just need to fix a couple of tests
  • Phase 3: Async ORM (Django 3.2 / 4.0)
    • Largest, most difficult and most unbounded part of the project
    • ORM queries can result in lots of database lookups; have to be careful here
  • Async Project Wiki - project status, find out how to contribute

Brian #2: gamesbyexample

  • Al Sweigart
  • “PythonStdioGames : A collection of games (with source code) to use for example programming lessons. Written in Python 3. Click on the src folder to view all of the programs.”
  • I first learned programming by modifying games written by others and seeing what the different parts do when I change them. For me it was Lunar Lander on a TRS-80, and it took forever to type in the listing from the back of a magazine.
  • But now, you can just clone a repo and play with existing files.
  • Cool features:
    • They're short, with a limit of 256 lines of code.
    • They fit into a single source code file and have no installer.
    • They only use the Python standard library.
    • They only use stdio text; print() and input() in Python.
    • They're well commented.
    • They use as few programming concepts as possible. If classes, list comprehensions, recursion, aren't necessary for the program, then they are't used.
    • Elegant and efficient code is worthless next to code that is easy to understand and readable. These programs are for education, not production. Standard best practices, like not using global variables, can be ignored to make it easier to understand.
    • They do input validation and are bug free.
    • All functions have docstrings.
  • There’s also a todo list if people want to help out.

Aly #3: Bulwark

  • Open-source library that allows users to property test pandas DataFrames
    • Goal is to make it easy for data analysts and data scientists to write tests
  • Tests around data are different; they are not deterministic, they requires us to think about testing in a different way
    • With property tests, we can check an object has a certain property
  • Property tests for DataFrames includes validating the shape of the DataFrame, checking that a column is within a certain range, verifying a DataFrame has no NaNs, etc
  • Bulwark allows you to implement property tests as checks. Each check
    • Takes a DataFrame and optional arguments
    • The check will make an assertion about a DataFrame property
    • If the assertion passes, the check will return the original, unaltered DataFrame
    • If the check fails, an AssertionError is raised and you have context around why it failed
  • Bulwark also allows you to implement property checks as decorators
    • This is useful if you design data pipelines as functions
      • Each function take in input data, performs an action, returns output
    • Add decorators validate properties of input DataFrame to pipeline functions
  • Lots of builtin checks and decorators; easy to add your own
  • Slides with example usage and tips: Property Testing with Pandas with Bulwark

Brian #4: Poetry 1.0.0

  • Sebastien Eustace
  • caution: not backwards compatible
  • full change log
  • Highlights:
    • Poetry is getting serious.
    • more ways to manage environments
      • switch between python versions in a project with poetry env use /path/to/python
      • or poetry env use python3.7
    • Imroved support for private indices (instead of just pypi)
      • can specify index per dependency
      • can specify a secondary index
      • can specify a non-pypi index as default, avoiding pypi
    • Env variable support to more easily work with poetry in a CI environment
    • Improved add command to allow for constraints, paths, directories, etc for a dependency
    • publishing allows api tokens
    • marker specifiers on dependencies.

Aly #5: Kubernetes for Full-Stack Developers

  • With the rise of containers, Kubenetes has become the defacto platform for running and coordinating containerized applications across multiple machines
  • With the rise of containers, Kubenetes is the defacto platform for running and coordinating applications across multiple machines
  • This guide follows steps new users would take when learning how to deploy applications to Kubernetes:
    • Learn Kubernetes core concepts
    • Build modern 12 Factor web applications
    • Get applications working inside of containers
    • Deploy applications to Kubernetes
    • Manage cluster operations
  • New to containers? Check out my Introduction to Docker talk

Brian #6: testmon: selects tests affected by changed files and methods

  • On a previous episode (159) we mentioned pytest-picked and I incorrectly assumed it would run tests related to code that has changed, ‘cause it says “Run the tests related to the unstaged files or the current branch (according to Git)”.
  • I was wrong, Michael was right. It runs the tests that are in modified test files.
  • What I was thinking of is “testmon” which does what I was hoping for.
    • “pytest-testmon is a pytest plugin which selects and executes only tests you need to run. It does this by collecting dependencies between tests and all executed code (internally using Coverage.py) and comparing the dependencies against changes. testmon updates its database on each test execution, so it works independently of version control.”
  • If you had tried testmon before, like me, be aware that there have been significant changes in 1.0.0
  • Very cool to see continued effort on this project.

Extras:

Joke:

  • From Tyler Matteson
    • Two coroutines walk into a bar.
    • RuntimeError: 'bar' was never awaited.
  • From Ben Sandofsky
    • Q: How many developers on a message board does it take to screw in a light bulb?
    • A: “Why are you trying to do that?”


Audio Download

Posted on 3 January 2020 | 8:00 am


#161 Sloppy Python can mean fast answers!

Sponsored by DigitalOcean: pythonbytes.fm/digitalocean

Special guest: Anthony Herbert

Anthony #1: Larry Hastings - Solve Your Problem With Sloppy Python - PyCon 2018

  • Michael’s personal automation things that I do all the time
    • stripe to sheets automation
    • urlify
    • tons of reporting
    • wakeup - to get 100 on Lighthouse
    • deploy (on my servers)
    • creating import data for video courses
    • measuring duration of audio files

Michael #2: Introduction to ASGI: Emergence of an Async Python Web Ecosystem

  • by Florimond Manca
  • Python growth is not just data science
  • Python web development is back with an async spin, and it's exciting.
  • One of the main drivers of this endeavour is ASGI , the Asynchronous Standard Gateway Interface.
  • A guided tour about what ASGI is and what it means for modern Python web development.
  • Since 3.5 was released, the community has been literally async-ifying all the things. If you're curious, a lot of the resulting projects are now listed in aio-libs and awesome-asyncio .
  • An overview of ASGI
  • Why should I care? Interoperability is a strong selling point, there are many more advantages to using ASGI-based components for building Python web apps.
    • Speed: the async nature of ASGI apps and servers make them really fast (for Python, at least) — we're talking about 60k-70k req/s (consider that Flask and Django only achieve 10-20k in a similar situation).
    • Features: ASGI servers and frameworks gives you access to inherently concurrent features (WebSocket, Server-Sent Events, HTTP/2) that are impossible to implement using sync/WSGI.
    • Stability: ASGI as a spec has been around for about 3 years now, and version 3.0 is considered very stable. Foundational parts of the ecosystem are stabilizing as a result.
  • To get your hands dirty, try out any of the following projects:
    • uvicorn: ASGI server.
    • Starlette: ASGI framework.
    • TypeSystem: data validation and form rendering
    • Databases: async database library.
    • orm: asynchronous ORM.
    • HTTPX: async HTTP client w/ support for calling ASGI apps (useful as a test client).

Anthony #3: Python Insights

Michael #4: Assembly

  • via Luiz Honda
  • Assembly is a Pythonic Object-Oriented Web Framework built on Flask, that groups your routes by class
  • Assembly is a pythonic object-oriented, mid stack, batteries included framework built on Flask, that adds structure to your Flask application, and group your routes by class.
  • Assembly allows you to build web applications in much the same way you would build any other object-oriented Python program.
  • Assembly helps you create small to enterprise level applications easily.
  • Decisions made for you + features: github.com/mardix/assembly#decisions-made-for-you--features

Examples, root URLs:

    # Extends to Assembly makes it a route automatically
    # By default, Index will be the root url
    class Index(Assembly):

        # index is the entry route
        # -> /
        def index(self):
            return "welcome to my site"

        # method name becomes the route
        # -> /hello/
        def hello(self):
            return "I am a string"

        # undescore method name will be dasherize
        # -> /about-us/
        def about_us(self):
            return "I am a string"

Example of /blog.

    # The class name is part of the url prefix
    # This will become -> /blog
    class Blog(Assembly):

        # index will be the root 
        # -> /blog/
        def index(self):
            return [
                {
                    "title": "title 1",
                    "content": "content"
                },
                ...
            ]

        # with params. The order will be respected
        # -> /comments/1234/
        # 1234 will be passed to the id
        def comments(self, id):
            return [
                {
                    comments...
                }
            ]

Anthony #5: Building a Standalone GPS Logger with CircuitPython using @Adafruit and particle hardware

Michael #6: 10 reasons python is good to learn

  • Python is popular and good to learn because, in Michael’s words, it’s a full spectrum language.
  • And the reasons are:
  • Python Is Free and Open-Source
  • Python Is Popular, Loved, and Wanted
  • Python Has a Friendly and Devoted Community
  • Python Has Elegant and Concise Syntax
  • Python Is Multi-Platform
  • Python Supports Multiple Programming Paradigms
  • Python Offers Useful Built-In Libraries
  • Python Has Many Third-Party Packages
  • Python Is a General-Purpose Programming Language
  • Python Plays Nice with Others

Extras:

Michael:

Anthony:

Joke: The failed pickup line

  • A girl is hanging out at a bar with her friends.
  • Some guy comes up to her an says: “You are the ; to my line of code.”
  • She responds, “Get outta here creep, I code in Python.”


Audio Download

Posted on 18 December 2019 | 8:00 am