Python Beginners (1)

跟着Data quest学Python,记录一些知识点。

1. csv.reader()

import csv
f = open("file.csv", "r")
data = list(csv.reader(f))

2. regex

一些regex match的小练习——

# simple match
strings = ["better not put too much", "butter in the", "batter"]
bad_string = "We also wouldn't want it to be bitter"
regex = "^b.tter"
# \ escape sign, here it is to match [Serious] or (serious) or other combinations
import re
serious_count = 0
for row in posts:
    if"[\[\(Ss]erious[\]\)]", row[0]) is not None:
        serious_count += 1
# pattern substitute, use [Serious] to replace [\[\...[\]\)]) in row[0]
import re
posts_new = []
for row in posts:
    row[0] = re.sub("[\[\(][Ss]erious[\]\)]", "[Serious]", row[0])
# re.findall, to find all the strings that match; here it is to match 1000-2999. {3} means it repeatedly appears
import re
years = re.findall("[1-2][0-9]{3}", years_string)

3. Datetime

A Unix timestamp is a floating point value with no explicit mention of day, month, or year. This value represents the number of seconds that have passed since the "epoch", or the first second of the year 1970. So, a timestamp of 0.0 would represent the epoch, and a timestamp of 60.0 would represent one minute after the epoch. We can represent any date after 1970 this way.

To retrieve the current Unix timestamp, we use the time.time() function.

Converting Time
We can convert a timestamp to a more human-readable form using the time.gmtime() function. This function takes a timestamp as an argument, and returns an instance of the struct_time
class. struct_time

instances have attributes that represent the current time in other ways.
Here are some of the attributes:

  • tm_year: The year of the timestamp
  • tm_mon: The month of the timestamp (1-12)
  • tm_mday: The day in the month of the timestamp (1-31)
  • tm_hour: The hour of the timestamp (0-23)
  • tm_min: The minute of the timestamp (0-59)

For example, we can retrieve the year value as an integer using the tm_year property:

current_time = time.time()
current_struct_time = time.gmtime()
current_year = current_struct_time.tm_year

Note the value for the hour from the last screen. The time module always results in a UTC time. UTC stands for Coordinated Universal Time. This is the accepted time standard within the programming community. It corresponds to the mean solar time at 0° longitude, or Greenwich Mean Time, except that it doesn't follow daylight saving time. While we can convert UTC to other time zones, we'll use UTC in this mission for simplicity.

The datetime module has a datetime class that represents points in time. datetime instances appear similar to struct_time instances, and have the following attributes:

  • year
  • month
  • day
  • hour
  • minute
  • second
  • microsecond

To get the current datetime, we use the function, which returns a datetime.datetime instance.

import datetime
current_datetime =
print (current_datetime)
current_year = current_datetime.year
current_month = current_datetime.month

Time Lapse
Use datetime.timedelta to represent the difference between datetime.
Parameters in timedelta includes:

  • weeks
  • days
  • hours
  • minutes
  • seconds
  • milliseconds
  • microseconds
import datetime
today =
diff = datetime.timedelta(days = 1)
tomorrow = today + diff
yesterday = today - diff

Formatted Datetime
Just reference the python documentation for the use of .strftime()

import datetime
mystery_date_formatted_string = mystery_date.strftime("%I:%M%p on %A %B %d, %Y")
print (mystery_date_formatted_string)
# and the result is "12:00AM on Thursday December 31, 2015"

Transform strings to datetime format
Just as we can convert a datetime object into a formatted string, we can also do the reverse. The datetime.datetime.strptime() function allows us to convert a string to a datetime instance:

  • The date string (e.g. "Mar 03, 2010")
  • The format string (e.g. "%b %d, %Y")
import datetime
mystery_date_formatted_string = "12:00AM on Thursday January 02, 2003"
mystery_date = datetime.datetime.strptime(mystery_date_formatted_string, "%I:%M%p on %A %B %d, %Y")

Still another approach
datetime.datetime.fromtimestamp() results in defaulted "yyyy-mm-dd hh:mm:ss" format.

datetime_object = datetime.datetime.fromtimestamp(1433213314.0)
# which returns something like 2015-06-02 02:48:34

For the date already in datetime format, datetime.month would return the month integer of the date.

# here row[2] looks like 2015-06-02 02:48:34, already formatted.
march_count = 0
for row in posts:
    print (row[2])
    if row[2].month == 3:
        march_count += 1
print (march_count)

4. Set

Use set to get unique values from a list.

lunchtype = set(lunch)
print (lunchtype)