Sandro Tosi

Empire State Building Lights iCalendar

2021-05-10T17:07:00.003-04:00

I'm very lucky to be able to see the Empire State Building from my apartment windows, and at night the lights are fantastic! But i'm also curious to know what's going to be today's lights, and tomorrow, etc.

I thought I'd easily find a calendar to add to gCal to show that, but i wasn't able to find any, so I made it myself: https://sandrotosi.github.io/esb-lights-calendar/

Python: send emails with embedded images

2020-12-08T02:13:00.009-05:00

to send emails with images you need to use MIMEMultipart, but the basic approach:

import smtplib

from email.mime.multipart import MIMEMultipart
from email.mime.image import MIMEImage

msg = MIMEMultipart('alternative')
msg['Subject'] = "subject"
msg['From'] = from_addr
msg['To'] = to_addr

part = MIMEImage(open('/path/to/image', 'rb').read())

s = smtplib.SMTP('localhost')
s.sendmail(from_addr, to_addr, msg.as_string())
s.quit()

will produce an email with empty body and the image as an attachment.

The better way, ie to have the image as part of the body of the email, requires to write an HTML body that refers to that image:

import smtplib

from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.image import MIMEImage

msg = MIMEMultipart('alternative')
msg['Subject'] = "subject
msg['From'] = from_addr
msg['To'] = to_addr

text = MIMEText('<img src="cid:image1">', 'html')
msg.attach(text)

image = MIMEImage(open('/path/to/image', 'rb').read())

# Define the image's ID as referenced in the HTML body above
image.add_header('Content-ID', '<image1>')
msg.attach(image)

s = smtplib.SMTP('localhost')
s.sendmail(from_addr, to_addr, msg.as_string())
s.quit()

The trick is to define an image with a specific Content-ID and make that the only item in an HTML body: now you have an email with contains that specific image as the only content of the body, embedded in it.

Bonus point: if you want to take a snapshot of a webpage (which is kinda the reason i needed the code above) i found it extremely useful to use the Google PageSpeed Insights API; a good description on how to use that API with Python is available at this StackOverflow answer.

UPDATE (2020-12-26): I was made aware via email that some mail providers may not display images inline when the Content-ID value is too short (say, for example, Content-ID: 1). A solution that seems to work on most of the providers is using a sequence of random chars prefixed with a dot and suffixed with a valid mail domain.

It's a waiting game... but just how long we gotta wait?

2020-05-08T22:37:00.001-04:00

While waiting for my priority date to become current, and with enough "quarantine time" on my hand, i just come up with a very simple Python tool to parse the USCIS Visa Bulletin to gather some data from that.

You can find code and images in this GitHub repo.

For now it only contains a single plot for the EB3 final action date; it answers a simple question: how many months ago your priority date should be if you want to file your AOS on that month. We started from FY2016, to cover the final full year of the Obama administration.

If you're interested in more classes/visas, let me know and the tool could be easily extended to cover that too. PRs are always welcome.

Attending the Codecademy

2012-11-21T12:29:00.000-05:00

You've probably already got it, I'm surveying several sites to improve programming skills. This episode is about Codecademy.

It's a very well done site, for people that want to learn a language. It has a Python track, along with several others: Ruby, JQuery, Javascript & so on.

You'll be required to actually write code and run it! yes, the code you write is then executed in a web "interpreter" (modified for educational purposes) and the output displayed on screen. In a section is also possible to write to files and have their contents shown on another tab.

I'd encourage you to start from it if you never saw Python and you're willing to learn if from the ground up.

Spending a Sunday on CodingBat.com

2012-11-18T13:22:00.003-05:00

I've played a bit with Project Euler but all of their problems are math-centric, which is nice but not exactly what I'm looking for: some real-world programming problems to get back into the coding field.

So asking my friend Google, I found CodingBat: it has a Python section with several tasks to complete. I must say they are some kinda trivial to solve, once you know some idiomatic Python code, but some are a bit more interesting. If you're a junior Python coder, or want to get a grip on the language, give it a look.

Oh, and if you know some website that would give me some real-world programming coding problems (something that would be useful on the job, not just coding for fun), I would love to hear you.

Project Euler - Problem 4

2012-11-14T14:03:00.001-05:00

Here's my solution to Project Euler problem 4:

# handy function to check if a number is palindrome
def is_palindrome(i):
    return str(i) == str(i)[::-1]

# what's the max?
max_p = 0

# multiply all numbers `i` between 100 and 998, with all between i+1 and 999
for i in xrange(100, 999):
    for j in xrange(i+1, 1000):
        p = i * j
        if is_palindrome(p) and p > max_p:
            max_p = p

print max_p

Comments are welcome.

Update: fixed as per Alex comment.

Mercurial: how to completely remove a named branch

2011-08-10T21:53:00.001-04:00

I like so much the git feature branch workflow, that in the early days of development on Python with Mercurial I created some named branches; well, that is something you should not do.

In Mercurial, the changeset contains the branch name, so you cannot develop on a separated (named) branch and then merge on default and hope that branch goes away, because it will stay.

What do I do now? Python Mercurial repository is quite big (around 200Megs) so I wanted to avoid to re-check it out. Thanks to the help of the folks on #mercurial (on freenode IRC network) I found my solution: strip the branch!

Please note that strip is dangerous. Use it only as last resort, and mind you can lose data with it. That said, it's a very powerful tool :) My main aim was to remove completely those named branches, leave no traces, and lose the changes I made on them. Another important aspect is that I didn't merged those branches on default.

So, how to get rid of a named branch:

$ hg strip "branch(${BRANCHNAME})"

and re-iterate for all the branches you have, that's it. Now, to be completely sure they were removed and no spurious changes are in the repository, you can:

$ hg pull -u
$ hg outgoing

and if it says "no changes found" you're sure that those branches are really gone.

And what am I now? A Python Core Developer!

2011-08-01T18:13:00.000-04:00

Yeah, since a couple of hours I'm officially a Python Core Developer (and this confirms it, so I'm not dreaming!)

I'm now in that mixed state in between the happiness and the fear I'll do stupid mistakes and I'll be ashamed of myself. But hey, it's only those who do nothing that make no mistakes.

Interesting days ahead, a lot of procedures to learn and get used to, hopefully also a lot of bugs fixed :) That's for sure, I'll go step by step, following the better be safe than sorry rule.

At the end, I'd like to thank all the people at Python that made this possible, they are quite a number, so if I'd named them, I surely forgot someone, and it would be unfair! So well, you know who you are, and this big THANK YOU is yours :)

I'm going to EuroPython 2011

2011-05-18T19:10:00.000-04:00

I just got confirmation my company will sponsor me for EuroPython 2011 (thanks Register.it), so I'll be able to attend the whole week; a lot of amazing talks and the code sprints in the weekend: this is going to be a great time!

Are you coming?

Python: group a list in sub-lists of n items

2011-04-11T10:15:00.000-04:00

A long list, and you want to process its items n at a time; easy, but how to split that list in sublists of n elements (except the last one, of course)?

I looked a bit into the stdlib but it doesn't seem to exist anything I could use (oh, did I say I'm still on 2.4?) so I directed my research to Google, and found a nice recipe at ActiveState, but it has the problem it discards the last list, if it has less than n items.

Searching again, I got more lucky with this article: it's a generator of tuples from a list, splitting every n elements and optionally return the last semi-full tuple. I slightly modified it to obtain:

def group_iter(iterator, n=2):
    """ Given an iterator, it returns sub-lists made of n items
    (except the last that can have len < n)
    inspired by http://countergram.com/python-group-iterator-list-function"""
    accumulator = []
    for item in iterator:
        accumulator.append(item)
        if len(accumulator) == n: # tested as fast as separate counter
            yield accumulator
            accumulator = [] # tested faster than accumulator[:] = []
            # and tested as fast as re-using one list object
    if len(accumulator) != 0:
        yield accumulator

How would you have done it?

EuroPython 2011 @ Florence, IT - it's coming!

2011-01-21T02:40:00.004-05:00

Just when I was looking for a contact email to ask news about EuroPython 2011 dates... I noticed they are already there!! June 19 to 26 !

Book your flights, reserve your vacations, hope to see you all there!

PS: this year, PyCon Italia joins EuroPython in a single conference - the bigger the funnier

Re: Converting date to epoch

2010-12-29T03:37:00.002-05:00

Alexander,did you even consider that I might need to convert a date to epoch in a python script? :)

Convert a date to epoch

2010-12-28T17:00:00.002-05:00

it seems like an easy quest, ain't it? well, it took me far too long to get it done, so let just write it down, so that maybe I won't forget in 2 seconds:

>>> import time
>>> str = '2009-03-04'
>>> format = '%Y-%m-%d'
>>> time.mktime(time.strptime(str, format))
1236121200.0
>>> int(_)
1236121200

and here you have your epoch (if you just need seconds, use int()).

Feeling welcome: Python, you're doing it right!

2010-10-20T14:54:00.002-04:00

It was for a very long time (I think back from PyconIT 2) that I wanted to contribute to python. About a month and a half ago I started reading all the documentation I could about development processes & workflows, mercurial stuff (I don't have commit right, so I prefer working on a DVCS than on SVN), bugs management and so.

After all of this, less than month ago I started contributing very small patches; then when looking for "something to do" I found bugs that could be closed, in particular because they were already been fixed, and then last night R. David Murray proposed to give me "tracker privs", that moments after were granted \o/

What can I say? Thank you! that's something making me feel like I'm doing something useful to the language I love, and even if the privs are "not that much", it's a sign of trust, which really makes me happy!

As of now, I feel like I'm between "better be safe than sorry" and "it's easier to ask forgiveness than permission" mood, but we'll see how it goes.

Python, you're making me feel welcome, and so, once again, thank you!

HTTP requests specifying the Host header (in Python)

2010-09-17T09:53:00.002-04:00

Since it took me some time to find this solution, I think it might be worth to share it.

When you have a web server listening on a single IP address but serving several domains, it's quite common to run:

$ curl -H "Host: domain" http://ip_address/path/to/the/page.html

if you need to view that domain directly pointing to the web server (so avoiding any balancers or network "magics" you might have in place).

Well, the question is: how to do that in Python? I find my answer in the httplib module:

import httplib
conn = httplib.HTTPConnection("ip_address")
conn.putrequest("GET", "/", skip_host=True)
conn.putheader("Host", "domain.ext")
conn.endheaders()
res = conn.getresponse()
print res.read()

HTH

I'm going to Pycon Italia 4

2010-03-26T05:41:00.003-04:00

Yep, I'll be there this year too (sadly I can't say it's my fourth time, since I missed Pycon1).

I'm not that excited about the proposed talks, hey we don't need that much of Django :) I hope the invited speakers will surprise me, I'm confident the organization will do that.

See ya there!

Check Nagios from the desktop: nagstamon

2010-03-17T07:23:00.003-04:00

I just discovered nagstamon and all the team fallen in love with it!

I tried to use Nagios Checker, the Firefox plugin to notify of any Nagios alert, but that doesn't play nicely with several opened windows (alerts are multiplied for the number of windows, since it seems everyone does the checks, not just one) and it tends to slow down Firefox, that's already quite slow per se :)

The upstream author provides a Debian package, so promptly I wrote to him asking if he can consider maintain the package in Debian, with me as mentor/co-maintainer and so (it's in Python so I can that :) ); let's see how it goes.

Give it a try, it's really simple and awesome!

Project Euler - Problem 14

2010-03-05T18:45:00.006-05:00

Inspired by S. Lott and his blog post (and by the pure genius that xkcd is giving us these days, today included) I gave a look to Project Euler problem 14, that's about the Collatz conjecture.

The straightforward recursive solution:

def collatz(n):
 if n == 1:
     return 1
 if n % 2 == 0:
    return 1 + collatz(n/2)
 else:
    return 1 + collatz(3*n + 1)

very rapidly converges to... an error:

RuntimeError: maximum recursion depth exceeded

So I've recoded it a bit using a cache dictionary to store all the intermediate values in it:

cache = {1: 1}

def collatz(n, res):
 nn = n

 while True:
     if n in cache:
         if nn != n:
             cache[nn] = cache[n]
         return cache[nn]

     if n % 2 == 0:
         cache[n] = 1 + collatz(n/2, cache)
     else:
         cache[n] = 1 + collatz(3*n + 1, cache)

Now loping for all the numbers lower than 1 million we find the solution to the problem: the number with the longest path is 837799 with a path length of 524. The code runs in about 4secs.

Since I got that cache dict around to play with, I graphed with Matplotlib (strange, ha ;) ) the frequencies of each path length:
There are few short paths, several long paths but rare to occur, and most of the paths have length between 100 and 200 (more or less).

UPDATE: I've removed length variable, not needed (left from a previous version of the code) and fixed the path length for 837799, wrongly reported.

Convert a time string (with microseconds) in a datetime

2010-02-03T06:11:00.005-05:00

I struggled a bit with datetime & friends providing a strftime() that support %f (for microseconds output) but not a strptime() . That's really boring and counter-intuitive, and luckily it was fixed in 2.6:

$ python2.6
>>> import datetime
>>> datetime.datetime.strptime('22:57:39.101941', '%H:%M:%S.%f')
datetime.datetime(1900, 1, 1, 22, 57, 39, 101941)

but on my servers I still have 2.5, so what to do?

I found a reply on StackOverflow quite interesting:

$ python2.5
>>> from dateutil.parser import parser
>>> p = parser()
>>> p.parse('22:57:39.101941')
datetime.datetime(2010, 2, 3, 22, 57, 39, 101941)

ok, it uses the current date (instead of 1900-01-01) but it's still has the microseconds correctly recognized (and since I need a time diff, I'm fine with that).

Is being pirated sign of success?

2010-01-07T14:40:00.005-05:00

I don't know (and I don't think so) but for sure it's less money in my pocket :)

Anyhow, just the other day I found my book was available for "free download" in a post on a rapidshare forum.

I don't bother too much, it just a strange feeling...

Matplotlib for Python Developers - Images available for download

2009-12-29T05:29:00.004-05:00

This is a book on a graphic library, so when they first told me it will be printed in black&white I was quite surprised and puzzled. The editor explained it was to reduce paper copy cost, and that the PDF version will still be in full color, but I would be quite upset if I bought the book and then discover it's black&white without knowing it upfront.

Thus I've asked to at least let the images be downloadable from the book website, so that any reader (either for the paper or electronic copy) can see the pictures as if they're running the programs.

The first request went ignored, but keep pushing got results: now images can be downloaded!! To get them, got the book webpage, then in the "Code Download" section, and request the zip file: in it you'll find either the source code and the images.

The pictures are quite big, because they are the same used for book production (so with specific dimensions and DPI) but you got colors now :)

Matplotlib for Python Developers - PUBLISHED!

2009-11-21T12:21:00.006-05:00

Some days are passed, but I'm still pleased to announce that

The first book about Matplotlib has been PUBLISHED!!

It was a really nice experience, it offered me the possibility to work on Matplotlib, do some really interesting stuff, and I'm quite proud of it :)

On the other hand, it was not a "straight" way: the effort I put in this was huge, practically I had to stop all other stuffs and projects I was working on (Debian included) and I was getting more and more tired as time passed. Also, sometimes Packt employees and actions were somehow problematic. but anyhow, the important thing is that THE BOOK IS OUT!!

Now I got also a nice box about the book on the sidebar of this blog!

Enjoy it!!

The beast is quite done

2009-09-07T05:54:00.004-04:00

At the end, I made it: the book is in pre-order phase!! yayyy

I'm at the end of reviewing chapters, so I can see the light at the end of the tunnel :)

Probably I'll be able to sleep more than 4/5 hrs at night and have additional spare time to work on the projects I've neglected in these months

Web scraping with Python for fun and profit

2009-06-22T10:13:00.003-04:00

Web is everywhere, we know. It is also used more and more to present information to a wide audience. Sadly, it is commonly the only way data is presented...

That said, we need to get that info; the process of extracting information from web pages is knows as web scraping, and note that's is a very fragile process: every time the webpage changes, it's likely you'll have to modify the code that parses it.

The probably most famous Python module to do web scraping is BeautifulSoup. While it might be nice for simple webpages, I found it really hard to get something done for more complex pages, in particular with those with JavaScript embedded.

Thanks to Ian blogpost, I discovered how nice is to use lxml to do web scraping, in particular in association with Firebug Firefox addon: it's just a simple process of:

take the page;
generate the lxml tree;
with Firebug find the XPath to the element you need;
loop / parse / have fun :)

If you find in need to web scrape a page, give lxml a try: you'll be surprised and satisfied!

Update on the Matplotlib book

2009-05-28T04:28:00.004-04:00

Some people asked me how's the book writing is going. Ah you're right, it's a loooong time I didn't say anything of the progress, and here it is.

I've just turned the half of the book: about 10 days ago I've sent the 5th chapter (out of 10) and up to now I've covered this (high-level) contents by chapter:

installation & setup
first contact with mpl
some more stuff (like graph types)
OO style and some advanced things
GTK embedding

6th (and current) chapter is about Qt, so even KDE guys will be happy :)

I found several problems when working with GUI design programs: both Glade and Qt Designer made me scream a lot; but probably it's just me that I'm not so used to GUIs :)

Other arguments will be: Wx, web, real use cases.

In particular for the last part (real cases) I'd like to hear some proposals from you. I've already got something in mind, but users opinion will help me direct my work better.

Sorry, I've got to "drop/reduce" the science chapter (it was superseded by the real use-cases one) since I don't think this is the right place for it. Of course, a couple of "science examples" might come into the mentioned chapter, but your proposals have to be at a low-medium level.