{"id":468,"date":"2016-09-20T21:27:58","date_gmt":"2016-09-21T03:27:58","guid":{"rendered":"http:\/\/lucasmanual.com\/blog\/?p=468"},"modified":"2019-06-29T00:14:06","modified_gmt":"2019-06-29T06:14:06","slug":"quick-intro-to-cassandra-vs-mongodb-with-python","status":"publish","type":"post","link":"http:\/\/lucasmanual.com\/blog\/quick-intro-to-cassandra-vs-mongodb-with-python\/","title":{"rendered":"Quick Intro to Cassandra vs MongoDB with python"},"content":{"rendered":"<p><img decoding=\"async\" src=\"https:\/\/victorops.com\/wp-content\/uploads\/2015\/04\/apache-cassandra.gif\" alt=\"Cassandra Nosql\" \/><\/p>\n<ul><strong>Cassandra Conclusion:<\/strong><\/p>\n<li>&#8220;One way that Cassandra deviates from Mongo is that it offers much more control on how it\u2019s data is laid out. Consider a scenario where we are interested in laying out large quantities of data that are related, like a friend\u2019s list. Storing this in MongoDB can be a bit tricky \u2013 it\u2019s not great at storing lists that are continuously growing. If you don\u2019t store the friends in a single document, you end up risking pulling data from several different locations on disk (or on different servers) which can slow down your entire application. Under heavy load this will impact other queries being performed concurrently.&#8221;[1] <\/li>\n<li>If you have a project that is mature, it requires a lot of consecutive data that you will want to read later without jumping around to different disks. Cassandra looks like a strong candidate for: <\/li>\n<ol>\n<li>Show last 50 items for &#8220;TheMostIntrestingPersonInTheWorld&#8221;: item1,item2,..item3000..<\/li>\n<li>Show me last comments on &#8220;TheLucasMovie&#8221;: comment1,comment2,comment3,<\/li>\n<li>Show water level in Louisiana RiverIoT: level at 8am,level at 8:01am,level at 8:02am, x 100-1000 locations<\/li>\n<\/ol>\n<li>Great if you have data structure already setup, and it fits above model. [2][3]<\/li>\n<\/ul>\n<hr>\n<p><img decoding=\"async\" src=\"https:\/\/upload.wikimedia.org\/wikipedia\/en\/thumb\/4\/45\/MongoDB-Logo.svg\/450px-MongoDB-Logo.svg.png\" alt=\"MongoDB\" \/><\/p>\n<ul>\n<strong>MongoDB Conclusion:<\/strong><\/p>\n<li>No structure. import mongodb, mydb = db.myawsomedatabase, mydb.insert(start adding data). Done.<\/li>\n<li>You have a project and you are not sure how NoSQL will handle it but you want to try it. [4]<\/li>\n<li>You have a working process but its grown to a point where traditional RDMS can&#8217;t handle the IO load. [5]<\/li>\n<li>You don&#8217;t have time to create table structures just now, you just want to get going, and see what happens. <\/li>\n<li>You want to find documentation with python fast, and benefit from large community examples.<\/li>\n<\/ul>\n<p><iframe style=\"width:120px;height:240px;\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" frameborder=\"0\" src=\"\/\/ws-na.amazon-adsystem.com\/widgets\/q?ServiceVersion=20070822&#038;OneJS=1&#038;Operation=GetAdHtml&#038;MarketPlace=US&#038;source=ac&#038;ref=tf_til&#038;ad_type=product_link&#038;tracking_id=lucasmanual-20&#038;marketplace=amazon&#038;region=US&#038;placement=B00CPSIQ7O&#038;asins=B00CPSIQ7O&#038;linkId=f0768562b7cfa77b76e12de0b112f53b&#038;show_border=false&#038;link_opens_in_new_window=false&#038;price_color=333333&#038;title_color=0066c0&#038;bg_color=ffffff\"><br \/>\n    <\/iframe><\/p>\n<hr>\n<p><img decoding=\"async\" src=\"https:\/\/assets.toptal.io\/uploads\/blog\/category\/logo\/244\/casandra.png\" alt=\"Cassandra Python\" \/><br \/>\n<strong>Cassandra Code in Python; Details:<\/strong><br \/>\nInstallation:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n#Add cassandra repo to \/etc\/apt\/sources.list\r\ndeb http:\/\/www.apache.org\/dist\/cassandra\/debian 37x main\r\nsudo apt-get update\r\nupdate-alternatives --config java  #pick openjdk 8\r\nsudo apt-get install cassandra\r\n#status\r\nnodetool status\r\nnodetool info\r\nnodetool tpstats\r\n#python\r\nvirtualenv -p python3 env_py3\r\nsource env_py3\/bin\/activate\r\npip install cassandra-driver\r\n<\/pre>\n<p>Python:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nfrom cassandra.cluster import Cluster\r\ncluster=Cluster()\r\nsession = cluster.connect()\r\n\r\n#nodetool status\r\n#nodetool info\r\n#nodetool tpstats\r\n\r\n\r\n#https:\/\/github.com\/dkoepke\/cassandra-python-driver\/blob\/master\/example.py\r\nsession.execute(&quot;CREATE KEYSPACE vindata WITH replication = { 'class': 'SimpleStrategy', 'replication_factor': '1' }&quot;)\r\nsession.execute(&quot;use vindata&quot;)\r\n#http:\/\/www.slideshare.net\/ebenhewitt\/cassandra-datamodel-4985524 slide 23\r\nsession.execute(&quot;&quot;&quot;\r\nCREATE TABLE emissions (\r\nvin text,\r\nmake text,\r\nyear text,\r\nzip_code_of_station text,\r\nco2 text,\r\nyear_month_key int,\r\nPRIMARY KEY (vin)\r\n)\r\n&quot;&quot;&quot;)\r\n\r\n#https:\/\/www.youtube.com\/watch?v=97VBdgIgcCU\r\n#Load mydata\r\n\r\nimport glob\r\nprint(glob.glob(&quot;.\/data\/*.dat&quot;))\r\nsession.execute(&quot;use vindata&quot;)\r\n\r\n\r\nfor datafile in glob.glob(&quot;.\/data\/*.dat&quot;):\r\n    f=open(datafile, 'r')\r\n    data={}\r\n    for row in f.readlines():\r\n        data={}\r\n        data['vin']=row[:20].strip()\r\n        data['make']=row[20:24].strip()\r\n        data['year']=row[24:28].strip()\r\n        data['zip_code_of_station']=row[42:47].strip()\r\n        data['co2']=row[47:48].strip()\r\n        ymk='20'+datafile[-12:-8]\r\n        data['year_month_key']=ymk\r\n        #print(data)\r\n        session.execute(\r\n        &quot;&quot;&quot;\r\n        INSERT INTO emissions (vin, make, year,zip_code_of_station,co2,year_month_key)\r\n        VALUES (%s,%s,%s,%s,%s,%s)\r\n        &quot;&quot;&quot;,\r\n        (data['vin'],data['make'],data['year'],data['zip_code_of_station'],data['co2'],data['year_month_key'])\r\n    )\r\n    f.close()\r\n\r\nfuture=session.execute_async(&quot;SELECT * FROM emissions where vin='1B4GP33R9TB205257'&quot;)\r\nrows = future.result()\r\nfor row in rows:\r\n    print(row)\r\n<\/pre>\n<hr>\n<p><img decoding=\"async\" src=\"http:\/\/core0.staticworld.net\/images\/idge\/imported\/article\/nww\/2014\/04\/logo-mongodb-tagline-100275483-orig.png\" alt=\"MongoDB and Python\" \/><br \/>\n<strong>MongoDB Code in Python; Details:<\/strong><\/p>\n<p>Installation<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\nsudo aptitude install mongodb\r\n\/etc\/init.d\/mongodb start\r\n#python\r\nvirtualenv -p python3 env_py3\r\nsource env_py3\/bin\/activate\r\npip install pymongo\r\n<\/pre>\n<p>Python<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\n#http:\/\/api.mongodb.com\/python\/current\/tutorial.html\r\nfrom pymongo import MongoClient\r\nclient = MongoClient('mongodb:\/\/localhost:27017\/')\r\n#create database\r\ndb = client.vindata\r\n#create collection\/table\r\nemissions = db.emissions\r\n\r\n#Load data from mydata\r\nimport glob\r\nprint(glob.glob(&quot;.\/data\/*.dat&quot;))\r\nfor datafile in glob.glob(&quot;.\/data\/*.dat&quot;):\r\n    f=open(datafile, 'r')\r\n    data={}\r\n    for row in f.readlines():\r\n        data={}\r\n        data['vin']=row[:20].strip()\r\n        data['make']=row[20:24].strip()\r\n        data['year']=row[24:28].strip()\r\n        data['zip_code_of_station']=row[42:47].strip()\r\n        data['co2']=row[47:48].strip()\r\n        #data['year_month_key']=201608\r\n        ymk='20'+datafile[-12:-8]\r\n        data['year_month_key']=ymk\r\n        #print(data)\r\n        emissions.insert(data)\r\n    f.close()\r\n\r\nemissions.count()\r\nemissions.find_one()\r\nemissions.find_one({&quot;vin&quot;:&quot;1B4GP33R9TB205257&quot;})\r\n#http:\/\/altons.github.io\/python\/2013\/01\/21\/gentle-introduction-to-mongodb-using-pymongo\/\r\n#https:\/\/www.youtube.com\/watch?v=f7l8PTjQ160&amp;index=4&amp;list=PLGOsbT2r-igmFK9IKEGAnBaklqtuW7l8W\r\n#https:\/\/www.youtube.com\/watch?v=FVyIxdxsyok\r\n\r\n#-------BONUS--------------\r\nimport pandas\r\ncursor=emissions.find({&quot;year_month_key&quot;:&quot;201608&quot;})\r\nresult=pandas.DataFrame(list(cursor))\r\nresult.describe()\r\nresult.columns\r\n#http:\/\/lucasmanual.com\/mywiki\/Pandas\r\n#later http:\/\/alexgaudio.com\/2012\/07\/07\/monarymongopandas.html\r\n<\/pre>\n<hr>\n<p>Sources:<br \/>\n1. https:\/\/academy.datastax.com\/mongodb-to-cassandra-migration<br \/>\n2. http:\/\/www.slideshare.net\/nkorla1share\/cass-summit-3?qid=f85a27f7-a560-48bb-9d64-6eaa91c39f24&#038;v=&#038;b=&#038;from_search=8<br \/>\n3. https:\/\/www.youtube.com\/watch?v=tg6eIht-00M<br \/>\n4. https:\/\/www.mongodb.com\/customers\/city-of-chicago<br \/>\n5. https:\/\/www.youtube.com\/watch?v=FVyIxdxsyok<\/p>\n<p><iframe style=\"width:120px;height:240px;\" marginwidth=\"0\" marginheight=\"0\" scrolling=\"no\" frameborder=\"0\" src=\"\/\/ws-na.amazon-adsystem.com\/widgets\/q?ServiceVersion=20070822&#038;OneJS=1&#038;Operation=GetAdHtml&#038;MarketPlace=US&#038;source=ac&#038;ref=tf_til&#038;ad_type=product_link&#038;tracking_id=lucasmanual-20&#038;marketplace=amazon&#038;region=US&#038;placement=1491933666&#038;asins=1491933666&#038;linkId=d7a65fa5d4c1e8a417389574340096f6&#038;show_border=false&#038;link_opens_in_new_window=false&#038;price_color=333333&#038;title_color=0066c0&#038;bg_color=ffffff\"><br \/>\n    <\/iframe><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Cassandra Conclusion: &#8220;One way that Cassandra deviates from Mongo is that it offers much more control on how it\u2019s data is laid out. Consider a scenario where we are interested in laying out large quantities of data that are related, like a friend\u2019s list. Storing this in MongoDB can be a bit tricky \u2013 it\u2019s&hellip; <a class=\"more-link\" href=\"http:\/\/lucasmanual.com\/blog\/quick-intro-to-cassandra-vs-mongodb-with-python\/\">Continue reading <span class=\"screen-reader-text\">Quick Intro to Cassandra vs MongoDB with python<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,3,4,10],"tags":[12,18,25,16,15,11,13,17,14],"class_list":["post-468","post","type-post","status-publish","format-standard","hentry","category-corporate","category-debian","category-linux","category-nosql","tag-cassandra","tag-cassandra-driver","tag-debian","tag-intro","tag-linux","tag-mongodb","tag-pandas","tag-pymongo","tag-python","entry"],"_links":{"self":[{"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/posts\/468","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/comments?post=468"}],"version-history":[{"count":29,"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/posts\/468\/revisions"}],"predecessor-version":[{"id":586,"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/posts\/468\/revisions\/586"}],"wp:attachment":[{"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/media?parent=468"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/categories?post=468"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/lucasmanual.com\/blog\/wp-json\/wp\/v2\/tags?post=468"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}