Monday, 30 September 2013

Database Enterprise Data Clouds on EMC CLARiiON CX4-960

Introduction
This white paper provides specific recommendations for configuring and managing a CX4-960 system with
related Greenplum software to be an
effective participant in an EDC.
The guidelines and best practices
presented will let the reader config
ure their Greenplum/EMC node in the
most cost-efficient way, with an op
timal price/performance balance. We start with a basic building block
of servers and EMC storage that can then be replicated (almost infinitely) to scale out to your compute and
storage needs. Using this fundamental building block, you can scale out your Greenplum/EMC-based EDC
from tens of terabytes to tens of petabytes.
Building blocks provide a simple way to configure a
balanced I/O subsystem that will perform well with
Greenplum. This approach eliminat
es the need to perform extensive pl
anning of where to place partitions,
how many servers to purchase, or how to lay out data. The building block can fit in a single 42U rack and
provide 3 GB/s of raw I/O and as much as 30 GB/s of effective I/O (using compression).
As we’ll see throughout the paper, Greenplum and
EMC make a particularly attractive combination,
sporting a number of compelling synergies, including:
Deep compression and partitioning using EMC’s storage migration service
Ability to scale compute servers separately from storage servers
No need for mirroring
Improved high availability model using LUN takeover
Enhanced archiving using Greenplum’s Gpsuspend backup utility
Effective virtualization with se
rver motion and load balancing
Configuring a Greenplum/CLARiiON node as part of an EDC is quite straightforw
ard. We’ll describe the
following procedures and best practices:
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
4

1.
Balancing I/O activities between the two storage processors.
2.
Creating a balanced data distribution among LUNs.
3.
Balancing traffic across the back-end buses.
4.
Distributing I/O traffic among the UltraFlex™ I/O modules.
5.
Configuring disk and RAID.
6.
Setting efficient read/write cache values.
7.
Optimizing placement of the transaction log.
8.
Deploying Flash drives.
9.
Configuring the Greenplum Database and EDC.
Audience
This white paper is intended for Greenplum practitioners and/or IT staff responsible for planning and
managing the storage infrastructure for their enterprise data warehouse and analytic deployments. A basic
understanding of CLARiiON storage and Greenplum Database technologies is assumed.
Terminology
Analytics:
The study of operational data using statistical anal
ysis with a goal of identifying and leveraging
patterns to optimize business performance.
Business intelligence (BI):
The effective use of information assets to improve the profitability,
productivity, or efficiency of a business. Frequently, IT
professionals use this term to refer to the business
applications and tools that enable such information us
age. The source of information is frequently the
Enterprise Data Cloud.
Cloud computing
: A form of distributed computing where computational, data storage, and other assets
are accessible locally yet are hosted elsewhere.
Compute nodes:
Computers that are dedicated to performing complex computations; the underlying data is
generally stored and managed on separate storage servers.
Data warehouse (DW):
The process of organizing and managing information assets of an enterprise. IT
professionals often refer to the physically stored data content in some databases managed by database
management software as the data warehouse. They refer to applications that manipulate the data stored in
such databases as DW applications.
Decision support system (DSS):
A set of business applications and processes that provide answers in
response to different queries pertaining to the business, based on the business’s information assets, to help
direct or facilitate key business decisions.
Disk array enclosure (DAE)
: The physical enclosure with disk drive slots to support up to 15 drives to be
accessed from the CLARiiON storage processors
using the UltraPoint™ connection CLARiiON
technology. DAEs support the ability to grow the tota
l number of disk drives used in a CLARiiON system
in a modular fashion.
Enterprise Data Cloud (EDC):
A hardware and software solution designed to enable self-service
provisioning of data warehouses from tens of terabytes
to tens of petabytes, on tens to hundreds of nodes
working together in parallel.
Flash drive (FD)
: A solid-state data storage device with no moving parts. These types of drives deliver
significantly faster performance than traditional disk drives.
I/O cards
: The flexible I/O modules that can be added to CLARiiON CX4 systems to expand the
connection ports for increased front-
side connections from the servers on
the storage area network (SAN),
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
5

or back-end ports to provide for more I/O paths for the storage system
Logical unit number (LUN)
: A storage system object that can be
made visible and usable as a server
operating system “disk device” from
the underlying storage system.
Massively parallel processing (MPP):
A type of distributed computing architecture where tens to
hundreds of processors team up to work concurrently to solve large computational problems.
Private cloud:
A conglomeration of compute and storage servers that are generally dedicated to one
particular organization. These computers may be hosted behind the enterprise’s firewall, or may be
distributed across the Internet.
Redundant Array of Inexpensive Disks (RAID)
: A method of organizing and storing data distributed
over a set of physical disks, which logically appear to be one single storage disk device to any server host
and operating system performing I/O to access and mani
pulate the stored data. Frequently, redundant data
would be distributed and stored inside this set of physical disks to
protect against loss of data access should
one of the drives in the set fail.
RAID 5 (R5):
A RAID option where the actual data distributed and stored inside a set of drives is
effectively protected by an additional set of parity
data of the distributed content across the drives
computed and stored in an additional drive. Under EMC CLARiiON implementation, the extra parity data
is systematically rotated among all the drives in that RAID set to avoid any particular write hot spots when
parity data adjustment has to be made against any pi
ece of data stored inside a
LUN or LUNs created from
this RAID group. RAID 5 protects against loss of data
, or data inaccessibility, in
the event that one of the
drives in the RAID set should experience a drive failure.
RAID 10 (R10):
A RAID option that combines the performan
ce-enhancing features
of RAID 0 with the
data integrity capabilities of RAID 1. Data
is striped over mirrored drive pairs.
Scale out:
A technique that increases total processing power by adding additional independent
computational nodes, as opposed to augmenting a single, large computer with incremental disk, processor,
or memory resources.
Self-service provisioning:
A fundamental philosophy of the Enterprise Data Cloud, where business
analysts are provided with the tools and technology to let them quickly construct their own data warehouses
with minimal support from IT staff.
Shared nothing:
A distributed computing architecture made up of a collection of independent, self-
sufficient nodes. This is in contrast with a trad
itional central computer that hosts all information and
processing in a single location.
Technology overview
Before itemizing the steps necessary to deploy Greenplum on CLARiiON storage as part of an EDC, let’s
examine each of the components in more detail.
Greenplum Database
The Greenplum Database is at the heart of each node in an EDC. It's designed for business intelligence and
analytical processing, utilizing a sh
ared nothing, massively parallel
architecture to support tremendous
scalability, multi-level fault tolerance, and redundancy. Since Greenplum’s philosophy is to maximize
uptime while minimizing the IT burden, the database is designed for online system expansion. Typical
installations range from tens of terabytes to petabytes.
Greenplum lets clusters of servers act as a database supercomputer. Although not required, you have the
freedom and flexibility to partition your information using several different criteria, including:
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
6

Date
Range
Value
All queries are executed using parallel processing. In sp
ite of this power, develo
pers and users are free to
use familiar SQL Server statements, in any of the following SQL standards:
SQL-92
SQL-99
SQL-2003 OLAP extensions
In addition to working with native SQL, developers are also free to employ MapReduce for high-scale data
analysis. Finally, Greenplum support
s access via a broad collection of industry-standard interfaces such as:
SQL
ODBC
JDBC
DBI
Taken together, these capabilities make a node running the combination of Greenplum and CLARiiON an
ideal participant in an EDC.
Greenplum Enterprise Data Cloud
Enterprise Data Clouds represent an innovative new way to manage the information challenges of the 21st
century. The sheer amount of information that must be stored, managed, and queried continues to grow at
an accelerating rate. To make matters worse, this data
is most commonly stored in multiple silos, using
multiple formats.
Administering this information collection is proving to be too large a task for most backlogged IT
organizations. There’s little time to co
nfigure data warehouses, yet business analysts need access to this
data as quickly as possible.
Existing technologies and architectural approaches have proven to be unable to address these needs. Some
reasons include the following:
OLTP-style databases simply won't scale to support modern data warehousing and analytic
applications.
Enterprise data warehouses were an attempt to cr
eate a “data mainframe,” but they are expensive and
rigid, two traits that are distinctly undesirable in today’s cost-conscious, on-demand world.
Furthermore, many organizations have found that building the single, all-encompassing data model
mandated by this approach simply won’t work in the real world.
Data warehousing appliances, which are proprietary, turnkey hardware and software solutions,
perpetuate rigid, fragmented silo
approaches to information access.
Given that existing technology has been unable to properly address the dramatic growth and distribution of
information, it’s no surprise that many enterprise
s find themselves in the predicament illustrated in
Figure
1
.
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
7

Figure 1. The majority of the organization’s data is hidden and locked away in silos
In contrast to the shortcomings of the above approaches, Enterprise Data Clouds offer a number of
substantial advantages:
Self-service provisioning.
Business analysts are provided with a Web-based user interface to quickly
produce virtual data warehouses. These warehouses
can be created instantly, and combined from
multiple locations. Greenplum employs scatter/gather streaming technology to let business analysts
quickly load their own data. This relieves IT of many burdens, letting them focus solely on assembling
pools of servers for provisioning.
Parallelization and expandability.
The Enterprise Data Cloud offers extreme scale and elastic
expansion. Your data volumes can be dynamically expanded or reduced, depending on your needs. It
also supports massively parallel analytic processing using SQL or MapReduce.
Scalability and performance.
Enterprise Data Clouds scale from tens of terabytes to tens of
petabytes. In spite of the sheer amount of data av
ailable for access, business analysts are free to run
extensive queries without having to worry about impacting production applications or other analysts.
Deployment flexibility.
The Enterprise Data Cloud can run on internal hardware, or using external
resources hosted in the cloud. As
we’ll see in this white paper,
the EMC CLARiiON CX4-960 is an
ideal platform for this type of application.
Data mart consolidation.
Data marts on existing platforms can easily be migrated to the EDC within
a very brief amount of time, while preserving the organization’s investment in supporting technologies
such as business intelligence.
EMC CLARiiON
The EMC CLARiiON CX4 series delivers industry-leading innovation in midrange storage with the fourth-
generation CLARiiON CX storage platform. The unique combination of flexible, scalable hardware design
and advanced software capabilities enables EMC CLAR
iiON CX4 series systems, powered by Intel Xeon
processors, to meet the growing and diverse needs of today’s midsize and large enterprises. Through
innovative technologies like Flash drives, UltraFle
x technology, and CLARiiON Virtual Provisioning™,
customers can decrease costs and energy use while optimizing availability and virtualization.
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
8

Configuration bui
lding blocks
Greenplum is an MPP database, designed to work optimally in a homogenous compute environment. By
combining the processing power of several to several
hundred machines, you can ea
sily scale Greenplum to
your compute and storage requirements. To this end, we will outline a building block that combines a
Greenplum compute with EMC CLARiiON storage. Using
this building block, you can start with as few as
four servers with 21 TB of aggregate storage and grow to over a thousand machines with 5 PB of
uncompressed storage.
Storage building block
The building block starts with an EMC CX4-960 configur
ed with 71 drives. This
configuration will require
21 rack units of space and about 2,300 watts of power,
thus occupying half of
a standard 42U rack. The
detailed components are as follows:
Two service processors
Five Vault hard drives
64 data drives (600 GB or 1 TB)
Two spare drives
Four FlexIO modules
Five DAEs
One battery backup unit
The net usable space for the database depends on a number
of factors, including th
e drive size, drive count,
and RAID protection level.
The raw usable capacity (R) is defined as follows:
D = Drive count
C = Drive capacity
O = RAID overhead
R = D * C * O
Assuming RAID 10 with 1 TB drives, the equation would look as follows:
R = 1,000 * 64 * .5
R = 32,000 MB
Once you have determined R, you need to subtract out file system overhead and sort space required by the
database. For file system overhead, we will assume
10%, which is fairly conser
vative. For sort space,
we’ll assume 33% overhead, after file system formatting.
Give the above, usable space (U) for the database can be calculated as follows:
U = (R * 0.9) / 1.33
Assuming 1 TB drives, the equation works out as follows:
U = (32,000 * 0.9) / 1.33
U = 21,654 MB
With 2 TB drives on the horizon, th
e usable space will soon ju
mp to 43 TB. If you plan on using 600 GB
FC-AL drives, the usable space works out to ~13 TB.
The following table of attributes itemizes the components of the building block:
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
9

Table 1. Components of the building block
Item
Space usage
DAEs (five @ 3U each)
15U
SP chassis
6U
Total
21U
Disk type
Count
Data
64
Hot spares
2
System drives
5
Total
71 disks
In a RAID 10 configuration, measured sustainable throughput for this building block is approximately 3.1
GB/s.
Scaling out the storage building block
As we will describe later, scaling out your EDC enta
ils deploying additional CX4 units, as illustrated in
Figure 2
:
Figure 2. Sample scale-out configuration with four CX4 building blocks and Cisco UCS
blades
There are a number of dramatic bene
fits to this approach, including:
High storage capacity per unit of rack space. Even wh
en the overhead of RAID 10 is included, this can
be as much as two to 10 times greater than other offerings
Superior virtualization with server motion and load balancing
Simpler, proven storage management including disaster recovery and backup
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
10

Improved high-availability model using LUN takeover
Compute building block
In addition to storage, the Greenplum system will require a number of commodity servers to provide
compute resources for the database. A typical compute node would be configured as follows:
Two CPU sockets and at leas
t eight CPU cores (total)
4 GB to 6 GB of RAM per CPU core
Two 10 Gb NICs (for redundancy) or four 1 Gb NICs (for redundancy and throughput)
Two dual-port 4 GB HBAs (for redundancy)
CentOS 5.x, RHEL 5.x, or SLES 10 SP2
In most situations, four servers per CLARiiON storage node would be sufficient. However, since every
workload is unique, this number can easily be doubled or halved, depending upon your compute
requirements. For the majority of worklo
ads, four servers w
ill be a good fit.
Flexibility
One of the most compelling aspects of the Greenplum
/CLARiiON configuration is its ease of setup and
maintenance. By detaching the storage nodes from
the compute nodes, you’re
free to add as many
computational resources as necessary. You may also
incorporate additional storage nodes, or simply
expand a single resource without impacting any other nodes. As we’ll see later, configuring the Greenplum
Database is handled through a software interface, th
us simplifying managerial and administrative tasks.
Finally, you’re able to choose the optimal connectivity
option, from 10 Gb iSCSI to Fibre Channel or Fibre
Channel over Ethernet (FCoE).
Configuration guidelines
Now that we’ve itemized all of the components that make up the Greenplum/CLARiiON building block,
let’s explore how to configure thes
e elements for optimal performance.
Note that perfo
rmance of data
warehousing and analytical workloads is primarily dependent on the ability of the I/O system to perform
sequential scans at high bandwidth.
Even when these systems are used with high concurrency, the database and operating system together
reorder the I/O into blocks of 512 KB reads, which is a “piecewise sequential” workload. Consequently,
the performance tuning of the system focuses on achieving the maximum possible sequential bandwidth
from each CLARiiON array in the system. Additionally, because the Gr
eenplum Database is a scale-out
MPP architecture, multiple CLARiiON arrays are commonl
y used to increase the I/O performance of the
system beyond the limitations of any single CLARiiON array.
Balanced I/O configuration
MPP database systems operate at peak effectiveness
and efficiency when their
resources are distributed
across the entire cluster. Thus, it’s optimal to ensure th
at all of the work that you assign to the CLARiiON
storage platform is balanced. The Greenplum Database will automatically adjust your data across the
compute nodes. However, to achieve the same benef
its with the storage system
, there are a number of
configuration options to cons
ider, as we’ll explore next.
Balancing I/O activities between the two storage processors
The CX4-960 employs the traditional system architectural technique of having I/O requests from servers
against LUNs automatically serviced
by one of the two storage system
processors (SPs) that “owns” the
LUN at any one instance in time.
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
11

To ensure that all LUNs used by the Greenplum engi
ne are balanced, each SP is configured with exactly
half of the LUNs. In the case of the RAID 10 buildin
g block standard, the Greenplum choice is to establish
16 LUNs from eight 4+4 R10 groups using 64 of the 71 drives in the CX4-960 unit, and associating eight
LUNs with each storage processor.
Balancing data distri
butions among LUNs
To improve performance, streamline operations, and reduce manual administrative responsibilities, the
Greenplum Database automatically distributes data across nodes and LUNs. The 16 LUNs on the
CLARiiON are organized by Greenplum
such that each of the four com
pute nodes owns four of the LUNs.
By default, queries will be processed in parallel
across these LUNs, thereby e
ngaging all spindles and
CPUs to compute the results.
Balancing traffic across the back-end buses
In the CLARiiON system architecture,
each disk drive in a DAE is simu
ltaneously available from both SP-
A and SP-B. Each SP uses one of its back-end I/O
ports to communicate with the DAE’s link control card
(LCC).
A back-end bus consists of a b
ack-end I/O port from each SP conn
ecting into the same DAE. For the
Greenplum building block configuration that we’re describing in the paper, there are five back-end bus
ports used on each SP, which results in two back-end buses per DAE. This combination ensures that each
back-end bus has the same number of drives, and is perfectly balanced inside the storage array.
Each DAE has two 4 GB connections, one from each SP,
which gives it a theoretical throughput of 720
MB/s. In reality, however, you will typically experience a throughput rate of less than 600 MB/s from each
DAE, due to the aggregate processing limitations of the CX4 SP.
The practical bandwidth of the pair of CX4 SPs for real-world bandwidth intensive workload is
approximately 3 GB/s. The configuration using five DAEs that we’ve been describing is designed to fully
drive the disks and DAEs to achieve that peak SP
bandwidth. Additional DAEs and drives will allow the
usable capacity to be boosted but will not change the available peak bandwidth on the CX4-960.
Balancing I/O traffic among
the UltraFlex I/O modules
In a standard CX4 system, the FC
ports in the I/O modules are partiti
oned evenly between front-end and
back-end ports. This division helps balance I/O tr
affic through the CLARiiON st
orage processors. Should
more front-end or back-end ports be needed, the CX4-960 lets you add more as necessary.
In general, it’s a good idea to balance the number of front-end and back-end ports. However, this isn’t the
case with the Greenplum building block that we’re descri
bing in this paper. In this example, there are a
total of eight front-end ports (per SP) and five back-e
nd ports. The rationale behind this configuration is
that additional back-end ports are
not necessary. Because with the CLAR
iiON architecture, 10 total back-
end ports are already able to fully support the band
width feed to optimally en
gage the CLARiiON storage
processors.
Disk and RAID configuration
For the building block that we’re describing in this paper, we recommend RAID 10 protection. Although
RAID 10 has the largest overhead with respect to data protection, it also lets you employ slower SATA
drives. Despite using SATA drives, RAID 10 still lets you realize the performance you would expect from
FC-AL- or SCSI-based drives. By closing the performance gap between FC-AL and SATA drives, RAID
10 delivers superior cost-effectiveness with both 1 TB and (eventually) 2 TB disks.
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
12

Cache memory configuration
The CX4-960’s kernel lets you use up to 16 GB of physical system memory, which is a key performance
differentiator. Even after a
llowing for system-resident kernel software and drivers, there is still more than
13 GB of available storage processor memory availabl
e to be configured as data cache for use by the
different LUNs. This approach lets you configure dram
atically larger read and
write caches. For example,
you may configure more than 10 GB of mirrored write cache, which is a significant increase from the 3 GB
maximum supported in the CX3-80.
In addition, the CLARiiON cache conf
iguration supports a wide range of
cache page sizes. Applications
such as EDC tend to use larger database page sizes, su
ch as 32 KB. In general, it’s wise to match the cache
page size to a multiple of the database page size. Selecting the maximum value (16 KB) will tend to enable
the storage system to send the largest possible single
disk I/O requests through the back end to the physical
disk drives.
Write cache configuration
Given the massive size of most EDC deployments, there’s minimal advantage to allotting significant cache
for reading or prefetch operations
. Thus, we recommend that all cache memory be assigned to the write
cache. By following this approach, you
help ensure that writes are batched as large as possible, which can
dramatically enhance the rate at which the database loads. Additionally, it’s quite common to experience
numerous “smaller” write operations that are driven by sorting or temporary area overflow. Since you’ve
maximized the write cache, these additional write ope
rations will also occur as quickly as possible.
Read cache configuration
Surprisingly, given the predominant read patterns typical of EDC-style workloads, allocating storage read
caching for data LUNs generally doesn
’t produce as much of a performance payoff as might be expected.
As a matter of fact, implicit LUN data prefetching ma
y have the counterproductive effect of wasting I/O:
Data that has been prefetched may not be consumed
before it is aged out of the cache. Because of the
caching present on database servers, it’s nearly impo
ssible for the CLARiiON storage system to achieve
any meaningful read cache rehits.
Thus, we recommend disabling data caching and prefetching on all data LUNs and at the array level. To
disable LUN prefetching (which can be done dynamically), set the prefetch policy for a particular LUN to
NONE using the Navisphere
®
Manager or via NAVISECC
LI. Read cache can easily be disabled at the
array level within Navisphere, eliminating th
e need to disable read caching for each LUN.
Transaction log placement
Best practices for most database system dictate that
the transaction log is pl
aced on its own dedicated
storage – even Flash drives if available. On the other hand, Greenplum Databases don’t have this
requirement, since there is minimal logging. By leveragi
ng visibility bits, Greenplu
m is able to avoid much
of the logging that occurs in other database solutions. This is another example of Greenplum’s simplicity.
Deploying Flash drives
Flash drives (FD) are one of the ma
ny storage options available with the CLARiiON product line. Without
any moving parts, FDs are capable of servicing many random I/O requests per unit time, with low
milliseconds I/O service time. A typi
cal FD can sustain up to 200 MB/s of random or sequential I/O. As
an added benefit, unlike traditional spinning disks, there is no penalty for random I/O when using FDs.
Although the majority of EDC workloads don’t requir
e FDs, there are certain scenarios where they add
value:
As a staging area where new table partitions can be loaded and later migrated to traditional storage.
This can be useful if the most recent data is acce
ssed more frequently or if trickle feeding is used.
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
13

High-performance storage of indexed tables. When an index is used to address table data, the access
pattern to the base table becomes random instead of sequential. By placing the base table on FD,
indexed access achieves maximum performance. The tables themselves can be stored on normal disk
in these configurations. Greenplum supports B-Tree, Bitmap, GIST, hash, and other indexing
methods, although these types of indexes are not commonly employed in data warehouse
environments,
Temporary storage area for use in doing ELT or other transformations.
Although not included by default, the Greenplum/CLARi
iON building block that we’re describing in this
paper includes four empty drive bays that can be used for FDs. These drive bays are balanced across four
DAEs and eight SP back-end storage buses, ensuring
optimal performance. If you require additional FD
storage space, then you’ll ne
ed to conduct some additional planning
. In this case, your options include
adding additional DAEs or substituting existing drives
with FDs. In general,
deploying and configuring
FDs should be considered
on a case-by-case basis.
Greenplum/CLARiiON bu
ilding block and EDC
Now that we’ve described the optimal configuration for the CX4-960 building block, let’s examine how
Greenplum employs this foundation as a participant in
an EDC. For brevity’s sake
this exploration will be
high level; for more detailed hands-on instructions, consult the
Greenplum Database Administrator Guide
.
If you’re using a traditional hardware configuratio
n, the Greenplum/EDC building block delivers an
excellent experience with superior performance. However, as we described earlier, sophisticated
virtualization support is one of the most attractive
capabilities of this combination. Unlike technologies
from other database providers, Greenplum’s parallelization architecture lets it aggregate the performance
from multiple slower virtual machines to deliver dramatically faster results. While virtual machines always
introduce some degree of latency, Greenplum’s aggreg
ation approach along with its increased flexibility
and superior management capabilities outweigh these drawbacks.
As the logical view in
Figure 3
shows, each CX4-960 LUN can be represented with one or more mount
points (also described as filespaces). These filespaces
service VMware ESX containers, thus enabling the
server and database virtualization that are fundamental components of an EDC. For optimum efficiency,
there will be four filespaces per LU
N, thus saturating I/O
bandwidth. For enhanc
ed flexibility, you may
elect to create additional file
spaces, but there shou
ld always be a minimum of four.
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
14

Filespace
1
Filespace
2
Filespace
3
Filespace
4
Figure 3. Virtualization and storage
This is a powerful architecture,
yet Greenplum’s self-service provisioning capabilities and intuitive
software hide the underlying complexity from the analysts, as illustrated in
Figure 4
.
Greenplum Database Enterprise Data Clouds on EMC CLARiiON CX4-960 Building Blocks
Best Practices Planning
15

0 comments:

Post a Comment