Grungy's Blog

Star Date 242011809.2220: "Keep the Roads Rolling"

This is sort of a belated, but luckily not a posthumous entry after last week. I'm waiting for the next contender to show up and decimate me (please not another middle schooler with a sticky lunch box)…oh right ShmooCon…Things have to get done. On to that:

Accomplishments? :

  • Simplifying
  • counting (integers)

Two accomplishments impressive right? Not in the least considering each could have been avoided by watching Sesame Street instead of Barney as a little kid. I can't be faulted for the video choices at the daycare can I?

An explanation of the problem I think is in order:

Simplifying:

Okay there's a problem. Look at source code and see this:

Code Segment 1:

<td width="100"> 
      <div align="center"><b>Incident #</b></div>
    </td>

Alright someone wants to make the table look a little prettier and center the column headings. No problem it just throws another monkey wrench in the plans, but hey thats okay I like monkeys (I always feel bad for the ones that are in drive through zoos and cling to the top of cars…that rake guy is just brutal). This new styling can be worked around. So ponder some and write this up:

Code Segment 2:

def start_div(self, attrs):
      #stop div from breaking up tag
       if self.rip:
           try:
               #Try block is to handle when the first pop results in an empty list --> aka   
the first element in the list
               self.count -= 2
               #print "This is what is being popped: "
               #print self.pieces.pop()
               #print self.pieces.pop()
               print "Inside start_div: "
               print self.pieces
           except IndexError:
               #if second pop results in an error do nothing because all the needs to be done
               #is already done
               self.count += 1
               pass
def end_div(self):
       if self.rip:
          #self.count -= 1
          divTest = ""
          try:
              #try block handles the case where pieces is empty
              divTest = str(self.pieces[-1])
              print "This is divTest: %s" % divTest
              if divTest[-1] == ",":
                  #self.pieces[-1] = divTest.replace(",","")
                  self.pieces[-1] = divTest.strip(",")
                  print "entityref2: " + self.pieces[-1]
                  #self.count -= 1
          except IndexError:
              pass

Okay so Code Segment 2: is kind of like the paragraph code (start_p function) just a little modified. I'll admit it looks pretty nasty, like I could cut open a can on those jagged edges. Do I stop trying this solution after realizing that? Nah determination (or stubbornness) hurrah. Needless to say lots of weird output ensues. The short story is this code didn't properly modify the count variable so column placement would be correct. It was modifying count values to -2, which is way out of range.

damn back to the notebook…

*pound head into notebook…no effect*

A lot of pondering happens about what Nick said about simplifying processes, about how land in Wyoming is cheap, but arid looking and about my friends losing their jobs at Circuit City :(

This finally resulted in the realization that all I needed was to worry about the space that precedes the <div> tag and the one follows it. To do that I wrote this code:

Code Segment 3:

if text == " ":
              #handles the case of a div being there and it including 
              #a space before and after
              #the data you want between the div tag or something
              print "***Not including space"

This if statement is the first in the else-if chain (Code Segment 4:) dealing with count and what columns text is placed in. All it does is whenever the text is only a space it does nothing. This is the same result I was trying for with Code Segment 2: but it's really a one liner instead of two functions. Simple love it.

Code Segment 4:

          if text == " ":
              #handles the case of a div being there and it including a space
              # before and after
              #the data you want between the div tag or something
              print "***Not including space"
  
          elif self.count == 0:
              #tack date onto first cell
              print "count0: " + self.date + "," + text + ","
              self.pieces.append(self.date + "," + text + ",")
              self.count += 1
             
          elif self.count == 1:
              #grab the time from this column already have the date
              self.count += 1
              temp = text.split()
              try:
                  self.pieces.append(temp[1] + ",")
              except IndexError:
                  #This will happen when its the first line and text is "Description"
                  #hopefully this is the only case
                  pass
          elif self.count == 5:    #4 for 2003             
              #adds a newline after each complete entry
              self.pieces.append(text + "\n") 
              self.count = 0
              self.centerCount = 0
              print "count5: " + text
              
              if self.brFlag > 0:
                   #handles the case of multiple incident#'s in one cell
                   #By saving the previous incident# into tempHolding whenever a <br> tag is found
                   #and then grabbing the entire last line minus the incident# and date is possible to 
                   #tack on the needed numbers and dates with the for loop.
                   #Note: the date is store in the same cell of the list as the Incident#
                   #This case is seen on May 14 2003
                   self.pastEntry = self.pieces[-4:-1]
                   self.pastEntry.extend([self.pieces[-1]])
                   #print "This is pastEntry: %s" % self.pastEntry
     
                   for index in range(self.brFlag):
                       #print "This is brFlag: %s" % self.brFlag
                       self.pieces.extend( self.tempHolding[index] )
                       self.pieces.extend(self.pastEntry)
                   self.brFlag = 0
     
          else:
              self.pieces.append(text + ",") #comma is for delimeter in .csv
              self.count += 1
              print "else: " + text

Counting (integers)

These are the column headings from January 1, 2003 - April 7, 2004:

Incident #     Time     Description     Location     Disposition

These are the column headings from April 8, 2004 - January 16, 2009:

Incident #     Time     Occurred Date/Time     Description     Location     Disposition

A quick count of the column headings shows that there are 5 in the first and 6 in the second. Easy, Sesame Street easy.

Unfortunately I did most of my testing on April 9 because most of the dates after April 7 are just empty tables. This proved to be a poor choice because the column headings for April 9 look like this:

Incident #     Occurred Date/Time     Description     Location     Disposition

A quick count of this shows that there are 6 column headings just like the rest of them after April 8, 2004. After running the code errors crop up with the placement of data. Why? Did you actually count it? There are 5 column headings. Stumped me for a while.

Sesame Street vs. Barney

How does this relate to Sesame Street and Barney? It doesn't very much… but Barney does teach imagination and has extravagant make believe games. Sesame Street on the other hand has short clips of simple games and teaches you basics like counting. What it really boils down to is to keep it simple and remember the basics. Don't over think the problem because its not like a make believe game…

Conclusion

This was a colossal waste of time. We should all K.I.S.S. each others hardware/software and not let this happen to each other. As long as we keep the ball rolling we can finish this in time for ShmooCon. So please, someone bother me to look at my code and make suggestions (it is also in svn).

svn co svn+ssh://username@udarknet.com/var/svn/crimetracker

Will hopefully have spectacular news to update this blog with soon

NOTE: The next blog entry will be more of a comprehensive sum of last week and not just this weekend. I wanted to write this first for well my benefit… yea it's all about me

Star Date: 2421102009.046

This week proved to be fairly productive. I would have liked to have the database finished by now and indeed their were times (strike that most times) I thought it was done only to realize I had to handle another case to mine the correct data. I'm going to include a copy of the code changes for posterity at the end (largely incase I delete something accidentally…but others are welcome to um enjoy it as well), but the major changes that have been dealt with were:

  • MS-DOS style line return removal (as I'm running this on a sweet Ubuntu box)
  • proper handling of an HTML entity reference between to desired pieces of text. The previous version had the annoying habit of re-arranging the order of data elements in the file in some cases…Its inefficiency was excised…
  • an updated case to grab the date from the page and include it in every entry that uses regular expressions rather than a count to find the right center tag. This was necessary because the date was not always found in the same center tag…smells of DreamWeaver to me…
  • with help from hal0 (hope it doesn't get an “n” on the end) cycling through all of the pages (even across years and months) was made much easier by using the datetime module in python and some clever string formatting on his part. The datetime module is pretty amazing as it handles the amount of days in a month and everything. This way it is possible to generate the next Url to visit without having to first direct it to a month page and have it follow all of the links.
  • created a regular expression to clean up all text that has 2 - 20 (arbitrarily large number) of spaces and reduce it to 1. This was placed in the MS-DOS line return removal case in order to clean up the spaces used to align the text in the original table.

Still need to make a new format for years 2005-2009 because another column in the table was added. Also need to figure out how to interface with google maps so a nice visual of the police reports can be seen. If possible I'd like to contact public safety and find out the locations and model of the cameras already in place. However I don't think they're going to tell me the models definitely, but hopefully I'll get some locations. If not well I always liked Where's Waldo

With any luck and a good hustle we'll breach their lines by next week.

Regards,

Captain John Paul Jones

*instead of including the copy of the code I'm including this link to a nicely formatted view of what has been changed and added into the code (thanks afterburn).

link to code changes

· 2009/01/10 00:56 · grungy
 
Back to top
users/grungy/blog.txt · Last modified: 2009/01/09 23:17 by afterburn
 
 
chimeric.de = chi`s home Creative Commons License Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0