Casbah - Scala toolkit for MongoDB
Casbah is a Scala toolkit for MongoDB and it integrates a layer on top of the official mongo-java-driver for
better integration with Scala.
The recommended way to get started is with a dependency management system.
---------------------------------------------
You could get the source from :
https://github.com/alvinj/ ScalaCasbahConnections
Then you could modify your
-----------------
manju@manju-VirtualBox:~/
mongoConnector/ ScalaCasbahConnections$ cat build.sbt
organization := "com.alvinalexander"
name := "ScalatraCasbahMongo"
version := "0.1.0-SNAPSHOT"
scalaVersion := "2.11.8"
libraryDependencies += "org.mongodb" %% "casbah" % "3.1.1"
libraryDependencies += "com.mongodb.casbah" % "casbah-gridfs_2.8.1" % "2.1.5-1"
libraryDependencies += "org.slf4j" % "slf4j-log4j12" % "1.7.24"
resolvers += "Sonatype OSS Snapshots" at "http://oss.sonatype.org/ content/repositories/ snapshots/"
spb@spb-VirtualBox:~/ mongoConnector/ ScalaCasbahConnections$
manju@manju-VirtualBox:~/
The recommended way to get started is with a dependency management system.
---------------------------------------------
You could get the source from :
https://github.com/alvinj/
Then you could modify your
-----------------
manju@manju-VirtualBox:~/
organization := "com.alvinalexander"
name := "ScalatraCasbahMongo"
version := "0.1.0-SNAPSHOT"
scalaVersion := "2.11.8"
libraryDependencies += "org.mongodb" %% "casbah" % "3.1.1"
libraryDependencies += "com.mongodb.casbah" % "casbah-gridfs_2.8.1" % "2.1.5-1"
libraryDependencies += "org.slf4j" % "slf4j-log4j12" % "1.7.24"
resolvers += "Sonatype OSS Snapshots" at "http://oss.sonatype.org/
spb@spb-VirtualBox:~/
manju@manju-VirtualBox:~/
mongoConnector/ ScalaCasbahConnections$ sbt run
[info] Loading project definition from /home/spb/mongoConnector/ ScalaCasbahConnections/project
[info] Set current project to ScalatraCasbahMongo (in build file:/home/spb/mongoConnector/ Scala
[info] Loading project definition from /home/spb/mongoConnector/
[info] Set current project to ScalatraCasbahMongo (in build file:/home/spb/mongoConnector/
CasbahConnections/)
[info] Compiling 1 Scala source to /home/spb/mongoConnector/ ScalaCasbahConnections/target/
[info] Compiling 1 Scala source to /home/spb/mongoConnector/
[warn] there was one deprecation warning; re-run with -deprecation for details
[warn] one warning found
[info] Running casbahtests.MainDriver
debug: a
log4j:WARN No appenders could be found for logger (com.mongodb.casbah.commons. conversions.
scala. RegisterConversionHelpers$).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/ log4j/1.2/faq.html#noconfig for more info.
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/
debug: b
debug: c
debug: d
debug: e
debug: f
debug: g
debug: h
debug: i
debug: j
debug: k
debug: l
debug: m
debug: n
debug: o
debug: p
debug: q
debug: r
debug: s
debug: t
debug: u
debug: v
debug: w
debug: x
debug: y
debug: z
sleeping at the end
sleeping: 1
sleeping: 2
sleeping: 3
sleeping: 4
sleeping: 5
sleeping: 6
sleeping: 7
sleeping: 8
sleeping: 9
sleeping: 10
sleeping: 11
sleeping: 12
sleeping: 13
sleeping: 14
sleeping: 15
sleeping: 16
sleeping: 17
sleeping: 18
sleeping: 19
sleeping: 20
sleeping: 21
sleeping: 22
sleeping: 23
sleeping: 24
sleeping: 25
sleeping: 26
sleeping: 27
sleeping: 28
sleeping: 29
sleeping: 30
game over
debug: c
debug: d
debug: e
debug: f
debug: g
debug: h
debug: i
debug: j
debug: k
debug: l
debug: m
debug: n
debug: o
debug: p
debug: q
debug: r
debug: s
debug: t
debug: u
debug: v
debug: w
debug: x
debug: y
debug: z
sleeping at the end
sleeping: 1
sleeping: 2
sleeping: 3
sleeping: 4
sleeping: 5
sleeping: 6
sleeping: 7
sleeping: 8
sleeping: 9
sleeping: 10
sleeping: 11
sleeping: 12
sleeping: 13
sleeping: 14
sleeping: 15
sleeping: 16
sleeping: 17
sleeping: 18
sleeping: 19
sleeping: 20
sleeping: 21
sleeping: 22
sleeping: 23
sleeping: 24
sleeping: 25
sleeping: 26
sleeping: 27
sleeping: 28
sleeping: 29
sleeping: 30
game over
[success] Total time: 62 s, completed 13 Mar, 2017 5:37:31 PM
spb@spb-VirtualBox:~/ mongoConnector/ ScalaCasbahConnections$
spb@spb-VirtualBox:~/ mongoConnector/ ScalaCasbahConnections$ sbt package
[info] Loading project definition from /home/spb/mongoConnector/ ScalaCasbahConnections/project
[info] Set current project to ScalatraCasbahMongo (in build file:/home/spb/mongoConnector/ Scala
spb@spb-VirtualBox:~/
spb@spb-VirtualBox:~/
[info] Loading project definition from /home/spb/mongoConnector/
[info] Set current project to ScalatraCasbahMongo (in build file:/home/spb/mongoConnector/
CasbahConnections/)
[info] Packaging /home/spb/mongoConnector/ ScalaCasbahConnections/target/ scala-2.11/ scala
[info] Packaging /home/spb/mongoConnector/
tracasbahmongo_2.11-0.1. 0-SNAPSHOT.jar ...
[info] Done packaging.
[success] Total time: 1 s, completed 13 Mar, 2017 5:54:42 PM
spb@spb-VirtualBox:~/ mongoConnector/ ScalaCasbahConnections$
[success] Total time: 1 s, completed 13 Mar, 2017 5:54:42 PM
spb@spb-VirtualBox:~/
------------------------------ ------------------------------
spb@spb-VirtualBox:~/Scala_ project$ mongo
MongoDB shell version: 3.2.12
connecting to: test
Server has startup warnings:
> show dbs
local 0.000GB
mydb 0.000GB
> show dbs
finance 0.000GB
local 0.000GB
mydb 0.000GB
> show collections
> use finance
switched to db finance
> show collections
stocks
> db.stocks.find()
{ "_id" : ObjectId(" 58cd184edffa1f1829bfbc94"), "name" : "a", "symbol" : "a" }
{ "_id" : ObjectId(" 58cd184fdffa1f1829bfbc95"), "name" : "b", "symbol" : "b" }
{ "_id" : ObjectId(" 58cd1850dffa1f1829bfbc96"), "name" : "c", "symbol" : "c" }
{ "_id" : ObjectId(" 58cd1851dffa1f1829bfbc97"), "name" : "d", "symbol" : "d" }
{ "_id" : ObjectId(" 58cd1852dffa1f1829bfbc98"), "name" : "e", "symbol" : "e" }
{ "_id" : ObjectId(" 58cd1853dffa1f1829bfbc99"), "name" : "f", "symbol" : "f" }
{ "_id" : ObjectId(" 58cd1854dffa1f1829bfbc9a"), "name" : "g", "symbol" : "g" }
{ "_id" : ObjectId(" 58cd1855dffa1f1829bfbc9b"), "name" : "h", "symbol" : "h" }
{ "_id" : ObjectId(" 58cd1856dffa1f1829bfbc9c"), "name" : "i", "symbol" : "i" }
{ "_id" : ObjectId(" 58cd1857dffa1f1829bfbc9d"), "name" : "j", "symbol" : "j" }
{ "_id" : ObjectId(" 58cd1858dffa1f1829bfbc9e"), "name" : "k", "symbol" : "k" }
{ "_id" : ObjectId(" 58cd1859dffa1f1829bfbc9f"), "name" : "l", "symbol" : "l" }
{ "_id" : ObjectId(" 58cd185adffa1f1829bfbca0"), "name" : "m", "symbol" : "m" }
{ "_id" : ObjectId(" 58cd185bdffa1f1829bfbca1"), "name" : "n", "symbol" : "n" }
{ "_id" : ObjectId(" 58cd185cdffa1f1829bfbca2"), "name" : "o", "symbol" : "o" }
{ "_id" : ObjectId(" 58cd185ddffa1f1829bfbca3"), "name" : "p", "symbol" : "p" }
{ "_id" : ObjectId(" 58cd185edffa1f1829bfbca4"), "name" : "q", "symbol" : "q" }
{ "_id" : ObjectId(" 58cd185fdffa1f1829bfbca5"), "name" : "r", "symbol" : "r" }
{ "_id" : ObjectId(" 58cd1860dffa1f1829bfbca6"), "name" : "s", "symbol" : "s" }
{ "_id" : ObjectId(" 58cd1861dffa1f1829bfbca7"), "name" : "t", "symbol" : "t" }
Type "it" for more
>
-------------------------
spb@spb-VirtualBox:~/Scala_
MongoDB shell version: 3.2.12
connecting to: test
Server has startup warnings:
> show dbs
local 0.000GB
mydb 0.000GB
> show dbs
finance 0.000GB
local 0.000GB
mydb 0.000GB
> show collections
> use finance
switched to db finance
> show collections
stocks
> db.stocks.find()
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
{ "_id" : ObjectId("
Type "it" for more
>
-------------------------
----------------
There are two ways of getting the data from MongoDB to Apache Spark.
Method 1: Using Casbah (Layer on MongDB Java Driver)
val uriRemote = MongoClientURI("mongodb://RemoteURL:27017/")
val mongoClientRemote = MongoClient(uriRemote)
val dbRemote = mongoClientRemote("dbName")
val collectionRemote = dbRemote("collectionName")
val ipMongo = collectionRemote.find
val ipRDD = sc.makeRDD(ipMongo.toList)
ipRDD.saveAsTextFile("hdfs://path/to/hdfs")
Method 2: Spark Worker at our use
Better version of code: Using Spark worker and multiple core to use to get the data in short time.
val config = new Configuration()
config.set("mongo.job.input.format","com.mongodb.hadoop.MongoInputFormat")
config.set("mongo.input.uri", "mongodb://RemoteURL:27017/dbName.collection
Name")
val keyClassName = classOf[Object]
val valueClassName = classOf[BSONObject]
val inputFormatClassName = classOf[com.mongodb.hadoop.MongoInputFormat]
val ipRDD = sc.newAPIHadoopRDD(config,inputFormatClassName,keyClassName,
valueClassName)
ipRDD.saveAsTextFile("hdfs://path/to/hdfs")
---------------------------------------------------------------
Reference:
https://web.archive.org/web/20120402085626/http://api.mongodb.org/scala/casbah/current
/setting_up.html#setting-up-sbt