summaryrefslogtreecommitdiffstats
path: root/xlators/mgmt/glusterd
Commit message (Collapse)AuthorAgeFilesLines
* glusterd: fix glusterd_add_inode_size_to_dict to avoid glusterd to crashAtin Mukherjee2016-03-111-23/+26
| | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/#/c/8134/ http://review.gluster.org/#/c/8492/ There were couple of backports from mainline got missed and due to which glusterd crashes if the underlying file system doesn't fail under list of supported file systems. This patch takes care of handling this negative scenario. Reported-by: Michael Martel <michael.martel@vsc.edu> Change-Id: I6f601a4421869bbd7fc26e31f4ca4ffe075c0924 BUG: 1316116 Signed-off-by: Atin Mukherjee <amukherj@redhat.com> Reviewed-on: http://review.gluster.org/13661 NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org> Smoke: Gluster Build System <jenkins@build.gluster.com> CentOS-regression: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* cluster/afr: Make read child match check in afr optionalv3.5.4beta1Krutika Dhananjay2015-03-181-0/+6
| | | | | | | | | | | | | | | | | | Backport of: http://review.gluster.org/#/c/9917 Having this particular check which was introduced by commit bb2df4e63fa8a5d65f18b4a5efc757e8d475fbff causes a drop in performance in readdirp. So the behavior is made configurable with this patch. Change-Id: I9012a6bb955229a0cbb48f06e4e2edc0782dfead BUG: 1202675 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/9924 Reviewed-by: Pranith Kumar Karampuri <pkarampu@redhat.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* core: fix Ubuntu code audit (cppcheck) resultsKaleb S. KEITHLEY2014-11-265-22/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | See http://review.gluster.org/#/c/7583/ BZ 1086460 AFAICT these are false positives: [geo-replication/src/gsyncd.c:99]: (error) Memory leak: str [geo-replication/src/gsyncd.c:395]: (error) Memory leak: argv [xlators/nfs/server/src/nlm4.c:1200]: (error) Possible null pointer dereference: fde Program exits, resource leak not an issue [extras/geo-rep/gsync-sync-gfid.c:105]: (error) Resource leak: fp Test program: [extras/test/test-ffop.c:27]: (error) Buffer overrun possible for long command line arguments. Not built: [xlators/cluster/ha/src/ha.c:2699]: (error) Possible null pointer dereference: priv The remainder are fixed with this change-set: [heal/src/glfs-heal.c:357]: (error) Possible null pointer dereference: remote_subvol [libglusterfs/src/xlator.c:648]: (error) Uninitialized variable: gfid [libglusterfs/src/xlator.c:649]: (error) Uninitialized variable: gfid [xlators/cluster/afr/src/afr-inode-write.c:469]: (error) Possible null pointer dereference: frame [xlators/cluster/afr/src/afr-self-heal-common.c:1704]: (error) Possible null pointer dereference: local [xlators/cluster/dht/src/dht-rebalance.c:1643]: (error) Possible null pointer dereference: ctx [xlators/cluster/stripe/src/stripe.c:4963]: (error) Possible null pointer dereference: local [xlators/features/changelog/src/changelog.c:1464]: (error) Possible null pointer dereference: priv [xlators/mgmt/glusterd/src/glusterd-geo-rep.c:1656]: (error) Possible null pointer dereference: command [xlators/mgmt/glusterd/src/glusterd-replace-brick.c:914]: (error) Resource leak: file [xlators/mgmt/glusterd/src/glusterd-replace-brick.c:998]: (error) Resource leak: file [xlators/mgmt/glusterd/src/glusterd-sm.c:248]: (error) Possible null pointer dereference: new_ev_ctx [xlators/mgmt/glusterd/src/glusterd-store.c:1332]: (error) Possible null pointer dereference: handle [xlators/mgmt/glusterd/src/glusterd-utils.c:4706]: (error) Possible null pointer dereference: this [xlators/mgmt/glusterd/src/glusterd-utils.c:5613]: (error) Possible null pointer dereference: this [xlators/mgmt/glusterd/src/glusterd-utils.c:6342]: (error) Possible null pointer dereference: path_tokens [xlators/mgmt/glusterd/src/glusterd-utils.c:6343]: (error) Possible null pointer dereference: path_tokens [xlators/mount/fuse/src/fuse-bridge.c:4591]: (error) Uninitialized variable: finh [xlators/mount/fuse/src/fuse-bridge.c:3004]: (error) Possible null pointer dereference: state [xlators/nfs/server/src/nfs-common.c:89]: (error) Dangerous usage of 'volname' (strncpy doesn't always null-terminate it). [xlators/performance/quick-read/src/quick-read.c:585]: (error) Possible null pointer dereference: iobuf Rerunning cppcheck afterwards: As before, test program: [extras/test/test-ffop.c:27]: (error) Buffer overrun possible for long command line arguments. As before, believed to be false positive: [geo-replication/src/gsyncd.c:99]: (error) Memory leak: str [geo-replication/src/gsyncd.c:395]: (error) Memory leak: argv [xlators/nfs/server/src/nlm4.c:1200]: (error) Possible null pointer dereference: fde As before, not built: [xlators/cluster/ha/src/ha.c:2699]: (error) Possible null pointer dereference: priv False positive after fix: [heal/src/glfs-heal.c:356]: (error) Possible null pointer dereference: remote_subvol [xlators/cluster/stripe/src/stripe.c:4963]: (error) Possible null pointer dereference: local Change-Id: Ib3029d3223f5a13e2ac386a527d64d5ffe3ecb90 BUG: 1092037 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/7605 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterd: pass the bind-address to starting servicesNiels de Vos2014-10-211-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When the transport.socket.bind-address option is set to a hostname or ip-address, the services started by GlusterD fail to connect to the management daemon. GlusterD always forces the services to connect to the "localhost" hostname, even if it is not listening on that address. GlusterD should take the transport.socket.bind-address option into consideration, and pass that to the glusterfs-clients with the -s or --volfile commandline parameter. Note that this is not a change that removes all hard-coded dependencies on "localhost". This change merely makes it possible to start required services when the transport.socket.bind-address option is set. Cherry picked from commit 283fa797f4bf98130b42c36972305b8cb6e5aaaf: > Change-Id: I36a0ed6c69342e6327adc258fea023929055d7f2 > BUG: 1149863 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/8908 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> > Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> > Tested-by: Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: I36a0ed6c69342e6327adc258fea023929055d7f2 BUG: 1149857 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8952 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: make bricks respect 'transport.socket.bind-address'Niels de Vos2014-10-211-0/+8
| | | | | | | | | | | | | | | | | | | | | | When GlusterD starts the brick processes, these will listen on all interfaces. When the 'transport.socket.bind-address' option is set in glusterd.vol, the brick processes should only listen on the specified hostname or IP-address. Cherry picked from commit 430b874c4f1a171c106a9e1e6507e14e79805a1d: > Change-Id: I8e7d1f294904081137c23f3446261329d0d13bba > BUG: 1149863 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/8910 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> > Tested-by: Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: I8e7d1f294904081137c23f3446261329d0d13bba BUG: 1149857 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8953 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd/quota: Heal pgfid xattr on existing data when the quota isvmallika2014-10-201-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | enable Backport of: http://review.gluster.org/8878 The pgfid extended attributes are used to construct the ancestry path (from the file to the volume root) for nameless lookups on files. As NFS relies on nameless lookups heavily, quota enforcement through NFS would be inconsistent if quota were to be enabled on a volume with existing data. Solution is to heal the pgfid extended attributes as a part of lookup perfomed by quota-crawl process. In a posix lookup check for pgfid xattr and if it is missing set the xattr. Change-Id: I956128907aa1d975cd5719ed3ab2f4f9b37d4c31 BUG: 1153900 Signed-off-by: vmallika <vmallika@redhat.com> Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/8938 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* glusterd: fix compile warningNiels de Vos2014-09-091-1/+0
| | | | | | | | | | | | | | | | | | The following warning has been moved to an error and prevents the smoke tests in Jenkins to succeed. cc1: warnings being treated as errors /d/var_lib_jenkins_jobs/smoke/workspace/xlators/mgmt/glusterd/src/glusterd-utils.c: In function ‘glusterd_add_inode_size_to_dict’: /d/var_lib_jenkins_jobs/smoke/workspace/xlators/mgmt/glusterd/src/glusterd-utils.c:5038: error: unused variable ‘inode_size’ The warning was introduced with http://review.gluster.org/8491. Change-Id: I0c824aaf6df70dea35364af6fa72f34eea8c9829 BUG: 1081016 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8663 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* xlators/mgmt: don't allow glusterd fork bomb (cache the brick inode size)Niels de Vos2014-08-151-30/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Was don't leave zombies if required programs aren't installed Also, the existing if (strcmp (foo, bar) == 0) antipattern leaves me underwhelmed -- table driven is better; I like fully qualified paths to system tools too. File systems aren't going to change their inode size. Rather than fork-and-exec a tool repeatedly, hang on to the answer for subsequent use. Even if there are hundreds of volumes the size of a dict to keep this in memory is small. Cherry picked from commit f20d0ef8ad7d2f65a9234fc11101830873a9f6ab: > Change-Id: I704a8b1215446488b6e9e051a3e031af21b37adb > BUG: 1081013 > Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> > Reviewed-on: http://review.gluster.org/8134 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> > Tested-by: Krishnan Parthasarathi <kparthas@redhat.com> Change-Id: I704a8b1215446488b6e9e051a3e031af21b37adb BUG: 1081016 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8491 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* glusterd: call runner_end even if runner_start failsNiels de Vos2014-08-151-0/+11
| | | | | | | | | | | | | | | | | | | Cherry picked from commit aa199093fdf37dcd87a73cea83f9b9164d5800c5: > Change-Id: I5eca01a131307ba3be2aed4922eea73025ff284c > BUG: 1081013 > Signed-off-by: Jeff Darcy <jdarcy@redhat.com> > Reviewed-on: http://review.gluster.org/7360 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Niels de Vos <ndevos@redhat.com> > Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> > Reviewed-by: Anand Avati <avati@redhat.com> Change-Id: I5eca01a131307ba3be2aed4922eea73025ff284c BUG: 1081016 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8490 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* dict: add dict_set_dynstr_with_allocNiels de Vos2014-08-151-23/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is an overwhelming no. of instances of the following pattern in glusterd module. ... char *dynstr = gf_strdup (str); if (!dynstr) goto err; ret = dict_set_dynstr (dict, key, dynstr); if (ret) goto err; ... With this changes it would look as below, ret = dict_set_dynstr_with_alloc (dict, key, str); if (ret) goto err; Cherry picked from commit a9d4d369efc978511e3cb69e5643945710cc9416: > Change-Id: I6a47b1cbab4834badadc48c56d0b5c8c06c6dd4d > Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> > Reviewed-on: http://review.gluster.org/7379 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Backport notes: Included this change to accommodate additional backports. BUG: 1081016 Change-Id: I6a47b1cbab4834badadc48c56d0b5c8c06c6dd4d Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8489 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* rpcsvc: Validate RPC procedure number before fetchSantosh Kumar Pradhan2014-07-083-12/+12
| | | | | | | | | | | | | | | | | | | | | While accessing the procedures of given RPC program in, rpcsvc_get_program_vector_sizer(), It was not checking boundary conditions which would cause buffer overflow and subsequently SEGV. Make sure rpcsvc_actor_t arrays have numactors number of actors. FIX: Validate the RPC procedure number before fetching the actor. Upstream main review: http://review.gluster.org/7726 BUG: 1096020 Change-Id: Iaf207ee976cb56fa9a554ec82c9eab36d3b289ed Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/8228 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* gNFS: Make NFS DRC off by defaultNiels de Vos2014-06-101-1/+1
| | | | | | | | | | | | | | | | | | | | | | | DRC in NFS causes memory bloat and there are known memory corruptions. It would be good to disable drc by default till the feature is stable. Cherry picked from 4215d071cec4fc8a62ca4fd6212d83f931838829: > Change-Id: I93db6ef5298672c56fb117370bb582a5e5550b17 > BUG: 1105524 > Original-patch-by: Santosh Kumar Pradhan <spradhan@redhat.com> > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/8004 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> > Reviewed-by: Santosh Pradhan <spradhan@redhat.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I93db6ef5298672c56fb117370bb582a5e5550b17 BUG: 1105524 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8013 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd: Better op-version values and rangesNiels de Vos2014-06-101-3/+3
| | | | | | | | | | | | | | | | | | | | | | | Till now, the op-version was an incrementing integer that was incremented by 1 for every Y release (when using the X.Y.Z release numbering). This is not flexible enough to handle backports of features into Z releases. Going forward, from the upcoming 3.6.0 and 3.5.1 releases, the op-versions will be multi-digit integer values composed of the version numbers, instead of a simple incrementing integer. An X.Y.Z release will have XYZ as its op-version. Y and Z will always be 2 digits wide and will be padded with 0 if required. This way of bumping op-versions allows for gaps in between the subsequent Y releases. These gaps will allow backporting features from new Y releases into old Z releases. Change-Id: Ib6a09989f03521146e299ec0588fe36273191e47 Depends-on: http://review.gluster.org/7963 BUG: 1096425 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/8010 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Atin Mukherjee <amukherj@redhat.com>
* glusterd: On gaining quorum spawn_daemons in new threadKaushal M2014-06-084-28/+44
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/7703 from master During startup, if a glusterd has peers, it waits till quorum is obtained to spawn bricks and other services. If peers are not present, the daemons are started during glusterd' startup itself. The spawning of daemons as a quorum action was done without using a seperate thread, unlike the spawn on startup. Since, quotad was launched using the blocking runner_run api, this leads to the thread being blocked. The calling thread is almost always the epoll thread and this leads to a deadlock. The runner_run call blocks the epoll thread waiting for quotad to start, as a result glusterd cannot serve any requests. But the startup of quotad is blocked as it cannot fetch the volfile from glusterd. The fix for this is to launch the spawn daemons task in a seperate thread. This will free up the epoll thread and prevents the above deadlock from happening. BUG: 1105188 Change-Id: Idad1e96fbe1411dfd4b1a542fb5fa115673636c0 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/7995 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Niels de Vos <ndevos@redhat.com>
* rpc: implement server.manage-gids for group resolving on the bricksNiels de Vos2014-05-232-1/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new volume option 'server.manage-gids' can be enabled in environments where a user belongs to more than the current absolute maximum of 93 groups. This option triggers the following behavior: 1. The AUTH_GLUSTERFS structure sent by GlusterFS clients (fuse, nfs or libgfapi) will contain only one (1) auxiliary group, instead of a full list. This reduces network usage and prevents problems in encoding the AUTH_GLUSTERFS structure which should fit in 400 bytes. 2. The single group in the RPC Calls received by the server is replaced by resolving the groups server-side. Permission checks and similar in lower xlators are applied against the full list of groups where the user belongs to, and not the single auxiliary group that the client sent. Cherry picked from commit 2fd499d148fc8865c77de8b2c73fe0b7e1737882: > BUG: 1053579 > Signed-off-by: Niels de Vos <ndevos@redhat.com> > Reviewed-on: http://review.gluster.org/7501 > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Santosh Pradhan <spradhan@redhat.com> > Reviewed-by: Harshavardhana <harsha@harshavardhana.net> > Reviewed-by: Anand Avati <avati@redhat.com> Change-Id: I9e540de13e3022f8b63ff893ecba511129a47b91 BUG: 1096425 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/7830 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Santosh Pradhan <spradhan@redhat.com>
* libgfapi: Added support to fetch volume info from glusterd and store in ↵Soumya Koduri2014-05-121-0/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | glfs object. Defined new APIs in the libgfapi module, given a glfs object, * to send handshake RPC call to glusterd process to fetch UUID of the volume * store it in the glusterfs_context linked to the glfs object. * to parse UUID from its cannonical string format into 16-byte array before sending it to the libgfapi users. Defined a RPC call in glusterd which can be used to query volume related info by other processes using 'clnt_handshake_procs'. Note - Currently this RPC call to glusterd process is used only to fetch UUID. But it can be extended to get other volume related structures as well. In addition to the above, defined a new variable to keep track of such handshake RPCs still in progress to make sure all the corresponding RPC callbacks have been processed before libgfapi returns the glfs object initialized. Also bumping up the GFAPI current version number since there is a new API "glfs_get_volume_id" defined and exposed by libgfapi as part of these changes. Cherry picked from commit 5adb10b9ac1c634334f29732e062b12d747ae8c5: > Change-Id: I303f76d7177d32d25bdb301b1dbcf5cd73f42807 > BUG: 1095775 > Signed-off-by: Soumya Koduri <skoduri@redhat.com> > Reviewed-on: http://review.gluster.org/7218 > Reviewed-by: Anand Avati <avati@redhat.com> > Reviewed-by: Harshavardhana <harsha@harshavardhana.net> > Tested-by: Gluster Build System <jenkins@build.gluster.com> > Reviewed-by: Vijay Bellur <vbellur@redhat.com> This change differs a little from the patch in the master branch: - libgfapi in 3.5 does not have glfs_get_volfile(), so there were some merge conflicts resolved, - libgfapi in 3.5 is not versioned, the configure.ac changes related to the versioning have been skipped, - in the master branch only the XDR .x files are available, release-3.5 requires a manual re-generation of the relates .h and .c files. Change-Id: I52c32d0e69a52a7f4285f74164bca6fd83c4f3b3 BUG: 1095775 Signed-off-by: Soumya Koduri <skoduri@redhat.com> Reviewed-on: http://review.gluster.org/7741 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com>
* glusterd: Reset opinfo.op ONLY if lock succeededKrutika Dhananjay2014-02-071-3/+6
| | | | | | | | | | | | | | ... and also initialise @this before doing anything else. Change-Id: I0244a7f61a826b32f4c2dfe51e246f2593a38211 BUG: 1060434 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6885 Reviewed-by: Kaushal M <kaushal@redhat.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-on: http://review.gluster.org/6922 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mgmt/glusterd: Set right op versions for few options.Vijay Bellur2014-02-031-12/+12
| | | | | | | | | | | | compression and changelog translators appear first in 3.5. op-versions of options corresponding to these translators should be 3. Change-Id: Ib514207743e36eba53c3d5cf477c85136cf30b42 BUG: 923540 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6849 Reviewed-by: Kaushal M <kaushal@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterd/geo-rep : Allow volume to stop if geo-rep is not active.Kotresh H R2014-01-283-1/+176
| | | | | | | | | | | | | | | In staging phase of volume stop, code is added to read the state_file for each slave of the master to which the volume belongs. If any of the geo-rep session is active with at least one slave, volume is not allowed to stop else it is allowed. Change-Id: I4a01a357fc86b872e9635b3d19998cdbd9545114 BUG: 1049727 Signed-off-by: Kotresh H R <khiremat@redhat.com> Reviewed-on: http://review.gluster.org/6663 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6821
* mgmt/glusterd: make sure quota enforcer has established connection with ↵Raghavendra G2014-01-282-18/+20
| | | | | | | | | | | | | | | | | | quotad before marking quota as enabled. without this patch there is a window of time when quota is marked as enabled in quota-enforcer, but connection to quotad wouldn't have been established. Any checklimit done during this period can result in a failed fop because of unavailability of quotad. Change-Id: I0d509fabc434dd55ce9ec59157123524197fcc80 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> BUG: 969461 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6572 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6820
* features/quota: Metadata cleanupVarun Shastry2014-01-281-5/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quota and marker uses 'trusted.glusterfs.quota*' and 'trusted.pgfid*' xattrs to store its configurations and accounting information and also to build the parent inode chain in case of absense of path. Problem: After disabling and then enabling quota back, the xattrs may contain stale data leading to impaired accounting and thus improper enforcement. Solution: Clean up all the quota related xattrs after quota disable. Marker xlator implements a virtual xattr to cleanup quota and pgfid xattrs. In this approach glusterd mounts an auxiliary mount and sends the below command to all the files by crawling the mountpoint. #setfattr -n "glusterfs.quota-xattr-cleanup" -v 1 <path/to/file> Credit: Krishnan Parthasarathi <kparthas@redhat.com> Varun Shastry <vshastry@redhat.com> Change-Id: I9380eca58a285dc27dd572de1767aac8f2cd8049 BUG: 969461 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6369 Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6838
* mgmt/glusterd: Improve the description in volume set help outputVarun Shastry2014-01-281-2/+7
| | | | | | | | | | Change-Id: I785648970f53033a69922c23110b5eea9e47feb3 BUG: 1046030 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6573 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/6837
* glusterd/geo-rep: more glusterd and cli fixes for geo-rep.Ajeet Jha2014-01-276-251/+483
| | | | | | | | | | | | | | | | | | | | | | | | | | -> handle option validation cases in reset case. -> Creating valid conf path when glusterd restarts. -> Reading the gsyncd worker thread status and displaying it. -> Displaying status-detail per worker. -> Fetch checkpoint info in geo-rep status. -> use-tarssh value validation added. misc: misc geo-rep fixes based on cluster, logrotate etc.. -> cluster/dht: fix 'stime' getxattr getting overwritten. -> cluster/afr: return max of 'stime' values in subvol. -> geo-rep-logrotate: Sending SIGHUP to geo-rep auxiliary. -> cluster/dht: fix convoluted logic while aggregating. -> cluster/*: fix 'stime' min/max fetch logic. Change-Id: I811acea0bbd6194797a3e55d89295d1ea021ac85 BUG: 1036552 Signed-off-by: Ajeet Jha <ajha@redhat.com> Reviewed-on: http://review.gluster.org/6405 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@gmail.com> Reviewed-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6810 Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/compress: rename "compress" option to "network.compression"Niels de Vos2014-01-262-15/+20
| | | | | | | | | | | | | | | | Prevent mistaking the "compress" options for storage (at rest) compression. The cdc-xlator is implemented to support compressing of network traffic (READ and WRITE FOPs). URL: http://www.gluster.org/community/documentation/index.php/Features/On-Wire_Compression_+_Decompression Master-Change-Id: I9fedf4106dcb226d135ab92e4b533aff284881d7 Master-Reviewed-on: http://review.gluster.org/6765 Change-Id: Ib882af855b36df93fac46236c349c33dd4c3ced4 BUG: 1053670 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/6773 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Prashanth Pai <ppai@redhat.com>
* glusterd: Save/restore/sync rebalance dictKaushal M2014-01-031-0/+6
| | | | | | | | | | | | | | | | | | | | | A dictionary was added to store additional information of a rebalance process, like the bricks being removed in case of a rebalance started by remove-brick. This dictionary wasn't being stored/restored or synced during volume sync, leading to errors like a volume status command failing. These issues have been fixed in this patch. The rebalance dict is now stored/restored and also exported/imported during volume sync. Also, this makes sure that the rebalance dict is only create on remove-brick start. This adds a bricks decommissioned status to the information imported/exported during volume sync. BUG: 1040809 Change-Id: I46cee3e4e34a1f3266a20aac43368854594f01bc Signed-off-by: Kaushal M <kaushal@redhat.com> Backport-of: http://review.gluster.org/6492 Reviewed-on: http://review.gluster.org/6524 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: rebalance to ref volinfo before startingKrishnan Parthasarathi2013-12-232-5/+11
| | | | | | | | | | | Backport of http://review.gluster.org/6522 Change-Id: Ib316897dcbd0748bfb3bfcda186b9fe30c07f80f BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6570 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: make volinfo a refcnt'ed object.Krishnan Parthasarathi2013-12-236-6/+55
| | | | | | | | | | | | | | Backport of http://review.gluster.org/6521 Add glusterd_volinfo_remove(..) which removes @volinfo from the list of volumes in the cluster and performs an unref on @volinfo Change-Id: I5f546ca58f61bc334ab1bab4c51c4a21e1f66161 BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6569 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: ignore failure to stop a stopped service.Krishnan Parthasarathi2013-12-231-0/+12
| | | | | | | | | | | | | | | | Backport of http://review.gluster.org/6525 kill(2) returns -1 with errno set to ESRCH when the pid of the process being killed doesn't exist. Failing glusterd_brick_stop on a stopped brick could result in volume-stop failing, in commit phase. This fix prevents that from happening. Change-Id: I00f46fa06e489a671efbb8e4119f545f8ccea329 BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6568 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Fix stopping of stale rebalance processesKrishnan Parthasarathi2013-12-231-102/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/6531 Trying to stop rebalance process via RPC using the GD_SYNCOP macro, could lead to glusterd crashing. In case of an implicit volume update, which happens when a peer comes back up, the stop function would be called in the epoll thread. This would lead to glusterd crashing as the epoll thread doesn't have synctasks for the GD_SYNCOP macro to make use of. Instead of using the RPC method, we now terminate the rebalance process by kill(). The rebalance process has been designed to be resistant to interruption, so this will not lead to any data corruption. Also, when checking for stale rebalance task, make sure that the old task-id is not null. Change-Id: I54dd93803954ee55316cc58b5877f38d1ebc40b9 BUG: 1044327 Signed-off-by: Kaushal M <kaushal@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6567 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rpc,glusterd: Use rpc_clnt notifyfn to cleanup mydataKrishnan Parthasarathi2013-12-234-8/+27
| | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/5512 rpc: - On a RPC_TRANSPORT_CLEANUP event, rpc_clnt_notify calls the registered notifyfn with a RPC_CLNT_DESTROY event. The notifyfn should properly cleanup the saved mydata on this event. - Break the reconnect chain when an rpc client is disabled. This will prevent new disconnect events which can lead to crashes. glusterd: - Added support for RPC_CLNT_DESTROY in glusterd_brick_rpc_notify - Use a common glusterd_rpc_clnt_unref() function throught glusterd in place of rpc_clnt_unref(). This function correctly gives up the big-lock before performing the unref. Change-Id: I93230441c5089039643fc9f5632477ef1b695348 BUG: 962619 Signed-off-by: Kaushal M <kaushal@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6566 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Save/restore/sync rebalance dictKrishnan Parthasarathi2013-12-233-77/+208
| | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/6492 A dictionary was added to store additional information of a rebalance process, like the bricks being removed in case of a rebalance started by remove-brick. This dictionary wasn't being stored/restored or synced during volume sync, leading to errors like a volume status command failing. These issues have been fixed in this patch. The rebalance dict is now stored/restored and also exported/imported during volume sync. Also, this makes sure that the rebalance dict is only create on remove-brick start. This adds a bricks decommissioned status to the information imported/exported during volume sync. Change-Id: I56fed23dc2de80a96648055fe705e9c3ffd55227 BUG: 1040809 Signed-off-by: Kaushal M <kaushal@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6565 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Improve rebalance handling during volume syncKrishnan Parthasarathi2013-12-232-10/+200
| | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/6334 Glusterd will now correctly copy existing rebalance information when a volinfo is updated during volume sync. If the existing rebalance information was stale, then any existing rebalance process will be termimnated. A new rebalance process will be started only if there is no existing rebalance process. The rebalance process will not be started if the existing rebalance session had completed, failed or been stopped. Change-Id: I68c5984267c188734da76770ba557662d4ea3ee0 BUG: 1036464 Signed-off-by: Kaushal M <kaushal@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6564 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: Aggregate tasks status in 'volume status [tasks]'Krishnan Parthasarathi2013-12-234-18/+215
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backport of http://review.gluster.org/6230 Previously, glusterd used to just send back the local status of a task in a 'volume status [tasks]' command. As the rebalance operation is distributed and asynchronus, this meant that different peers could give different status values for a rebalance or remove-brick task. With this patch, all the peers will send back the tasks status as a part of the 'volume status' commit op, and the origin peer will aggregate these to arrive at a final status for the task. The aggregation is only done for rebalance or remove-brick tasks. The replace-brick task will have the same status on all the peers (see comment in glusterd_volume_status_aggregate_tasks_status() for more information) and need not be aggregated. The rebalance process has 5 states, NOT_STARTED - rebalance process has not been started on this node STARTED - rebalance process has been started and is still running STOPPED - rebalance process was stopped by a 'rebalance/remove-brick stop' command COMPLETED - rebalance process completed successfully FAILED - rebalance process failed to complete successfully The aggregation is done using the following precedence, STARTED > FAILED > STOPPED > COMPLETED > NOT_STARTED The new changes make the 'volume status tasks' command a distributed command as we need to get the task status from all peers. The following tests were performed, - Start a remove-brick task and do a status command on a peer which doesn't have the brick being removed. The remove-brick status was given correctly as 'in progress' and 'completed', instead of 'not started' - Start a rebalance task, run the status command. The status moved to 'completed' only after rebalance completed on all nodes. Also, change the CLI xml output code for rebalance status to use the same algorithm for status aggregation. Change-Id: Ifd4aff705aa51609a612d5a9194acc73e10a82c0 BUG: 1027094 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> http://review.gluster.org/6230 Reviewed-on: http://review.gluster.org/6562 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd: create rpc obj for rebalance only if absentv3.5.0qa3Krishnan Parthasarathi2013-12-053-38/+15
| | | | | | | | | Change-Id: Iff305023577ff92a8f43f24dafcf201f86805769 BUG: 1038051 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/6424 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: submit RPC requests without holding big lockAnand Avati2013-12-051-4/+27
| | | | | | | | | | | | | | | If the endpoint of an RPC is not connected, the callback is called synchronously within rpc_clnt_submit(). Since callbacks typically hold the big lock, give up the big lock before calling rpc_clnt_submit and acquire it freshly after the call. Change-Id: Id89d8dd86c1a4012739ef4af7ea0935492b1a02b BUG: 1037849 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/6415 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Krishnan Parthasarathi <kparthas@redhat.com>
* cli, glusterd: More quota fixes ...Krutika Dhananjay2013-12-033-311/+283
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ... which may be grouped under the following categories: 1. Fix incorrect cli exit status for 'quota list' cmd 2. Print appropriate error message on quota parse errors in cli Authored by: Anuradha Talur <atalur@redhat.com> 3. glusterd: Improve quota validation during stage-op 4. Fix peer probe issues resulting from quota conf checksum mismatches 5. Enhancements to CLI output in the event of quota command failures Authored by: Kaushal Madappa <kmadappa@redhat.com> 7. Move aux mount location from /tmp to /var/run/gluster Authored by: Krishnan Parthasarathi <kparthas@redhat.com> 8. Fix performance issues in quota limit-usage Authored by: Krutika Dhananjay <kdhananj@redhat.com> Note: Some functions that were used in earlier version of quota, that aren't called anymore have been removed. Change-Id: I963d4145f3ecdfe30c61bfa8920baccb33d2d4bd BUG: 969461 Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-on: http://review.gluster.org/6386 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/quota: Add the quota config xattr to newly added brickVarun Shastry2013-11-261-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Issue: Quota directory limit configuration is stored in the xattrs. When a new brick is added these 'limit-set' xattrs have to be created to the directory in the new brick. This is done by the dht directory healing when the directory is created in the new brick. Since 'root' directory is already created DHT doesn't heal the limit-set xattr root. Solution: When the add-brick command is issued run the below hook script to heal the 'limit-set' xattr. The hook script does the following only if limit is configured on root. 1. Create an auxiliary mount. 2. getxattr 'limit-set' on the root 3. setxattr the same value on the root But this script needs the volume to be started to make the auxiliary mount. To handle the case when the add-brick is issued when the volume was stopped, symlink is created by the 'master' script to the corresponding location and these two are by default disabled. So, a 'master' script is added in the add-brick/pre. When add-brick command is issued, it enables one of the scripts mentioned above based on the condition, if volume is started - enable add-brick/post script else - enable start/post script After the actual script completes its job, it disables itself. Note: The enabling and disabling of the script is based on the glusterd's logic, that it only runs the scripts which starts its name with 'S'. So, Enable - symlink the file to 'S'* Disable - unlink the symlink. Change-Id: I2d3947a4d686c54417ec95f530af3bdd3444f4e2 BUG: 969461 Signed-off-by: Varun Shastry <vshastry@redhat.com> Reviewed-on: http://review.gluster.org/6104 Reviewed-by: Brian Foster <bfoster@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: use proper copy to set node-nameBala.FA2013-11-261-2/+19
| | | | | | | | | | | | Previously node-name is set to point to node-uuid which could cause memory leak. This is fixed by having memory copy of node-uuid. BUG: 1012296 Change-Id: I3b638ec289d5b167c6e752ef1ba41f41efacb9da Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6330 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli/glusterd: Changes to quota command Quota featureRaghavendra G2013-11-2617-584/+2536
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | re-work. Following are the cli commands that are new/re-worked: ====================================================== volume quota <VOLNAME> {enable|disable|list [<path> ...]|remove <path>| default-soft-limit <percent>} | volume quota <VOLNAME> {limit-usage <path> <size> [<percent>]} | volume quota <VOLNAME> {alert-time|soft-timeout|hard-timeout} {<time>} volume status [all | <VOLNAME> [nfs|shd|<BRICK>|quotad]] [detail|clients|mem|inode|fd|callpool] volume statedump <VOLNAME> [nfs|quotad] [all|mem|iobuf|callpool|priv|fd|inode|history] glusterd changes: ================= * Quota limits are now set as extended attributes by glusterd from the aux mount created by the cli. * The gfids of the directories on which quota limits are set for a given volume are stored in /var/lib/glusterd/vols/<volname>/quota.conf file in binary format, and whose cksum and version is stored in /var/lib/glusterd/vols/<volname>/quota.cksum. Original-author: Krutika Dhananjay <kdhananj@redhat.com> Original-author: Krishnan Parthasarathi <kparthas@redhat.com> BUG: 969461 Change-Id: If32bba36c67f9c2a30417af9c6389045b2b7c13b Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com> Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-on: http://review.gluster.org/6003 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* posix: placeholders for GFID to path conversionRaghavendra G2013-11-262-19/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | what? ===== The following is an attempt to generate the paths of a file when only its gfid is known. To find the path of a directory, the symlink handle to the directory maintained in the ".glusterfs" backend directory is read. The symlink handle is generated using the gfid of the directory. It (handle) contains the directory's name and parent gfid, which are used to recursively construct the absolute path as seen by the user from the mount point. A similar approach cannot be used for a regular file or a symbolic link since its hardlink handle, generated using its gfid, doesn't contain its parent gfid and basename. So xattrs are set to store the parent gfids and the number of hardlinks to a file or a symlink having the same parent gfid. When an user/application requests for the paths of a regular file or a symlink with multiple hardlinks, using the parent gfids stored in the xattrs, the paths of the parent directories are generated as mentioned earlier. The base names of the hardlinks (with the same parent gfid) are determined by matching the actual backend inode numbers of each entry in the parent directory with that of the hardlink handle. Xattr is set on a regular file, link, and symbolic link as follows, Xattr name : trusted.pgfid.<pargfidstr> Xattr value : <number of hardlinks to a regular file/symlink with the same parentgfid> If a regular file, hard link, symbolic link is created then an xattr in the above format is set in the backend. how to use? =========== This functionality can be used through getxattr interface. Two keys - glusterfs.ancestry.dentry and glusterfs.ancestry.path - enable usage of this functionality. A successful getxattr will have the result stored under same keys. Values will be, glusterfs.ancestry.dentry: -------------------------- A linked list of gf-dirent structures for all possible paths from root to this gfid. If there are multiple paths, the linked-list will be a series of paths one after another. Each path will be a series of dentries representing all components of the path. This key is primarily for internal usage within glusterfs. glusterfs.ancestry.path: ------------------------ A string containing all possible paths from root to this gfid. Multiple hardlinks of a file or a symlink are displayed as a colon seperated list (this could interfere with path components containing ':'). e.g. If there is a file "file1" in root directory with two hardlinks, "/dir2/link2tofile1" and "/dir1/link1tofile1", then [root@alpha gfsmntpt]# getfattr -n glusterfs.ancestry.path -e text file1 glusterfs.ancestry.path="/file1:/dir2/link2tofile1:/dir1/link1tofile1" Thanks Amar, Avati and Venky for the inputs. Original Author: Ramana Raja <rraja@redhat.com> BUG: 990028 Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: I0eaa9101e333e0c1f66ccefd9e95944dd4a27497 Reviewed-on: http://review.gluster.org/5951 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cli: fix possible memory leaksBala.FA2013-11-212-1/+6
| | | | | | | | | BUG: 955548 Change-Id: Iae410712e7e6d7a76cd537c77f1919e3b4cdf6bb Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6328 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* bd: Add Zerofill FOP supportM. Mohan Kumar2013-11-202-0/+17
| | | | | | | | | | BUG: 1028673 Change-Id: I9ba8e3e6cf2f888640b4d2a2eb934a27ff903c42 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/6290 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gNFS: Coverity fix for CID 1128906Santosh Kumar Pradhan2013-11-201-11/+13
| | | | | | | | | | | | | Fix the Coverity issue introduced in RFE: NFS volume set/reset commit i.e. http://review.gluster.org/6236 Change-Id: I817b9da03a3ce7f5511303faea0c50dfdad60ff4 BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6307 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Start rebalance only where requiredKaushal M2013-11-203-1/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | Gluster was starting rebalance processes on peers where it wasn't required in two cases. - For a normal rebalance command on a volume, rebalance processes were started on all peers instead of just the peers which contain bricks of the volume - For rebalance process being restarted by a volume sync, caused by a new peer being probed or a peer restarting, rebalance processes were started on all peers, for both a normal rebalance and for remove-brick needing rebalance. This patch adds a new check before starting rebalance process in the above two cases. - For rebalance process required by a rebalance command, each peer will check if it contains atleast one brick of the volume - For rebalance process required by a remove-brick command, each peer will check if it contains atleast one of the bricks being removed Change-Id: I512da16994f0d5482889c3a009c46dc20a8a15bb BUG: 1031887 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/6301 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krutika Dhananjay <kdhananj@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* mgmt/glusterd: fix undefined sybmol error related to BDPranith Kumar K2013-11-193-1/+5
| | | | | | | | | | Change-Id: I2210f1ac7de04c6025c0ec02d998b626d41466ae BUG: 1028672 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/6303 Reviewed-by: M. Mohan Kumar <mohan@in.ibm.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cli: add peerid to volume status xml outputBala.FA2013-11-142-0/+36
| | | | | | | | | | | | This patch adds <peerid> tag to bricks and nfs/shd like services to volume status xml output. BUG: 955548 Change-Id: I9aaa9266e4d56f632235eaeef565e92d757c0694 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/6162 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaushal M <kaushal@redhat.com>
* gNFS: RFE for NFS connection behaviorSantosh Kumar Pradhan2013-11-144-0/+148
| | | | | | | | | | | | | | | Implement reconfigure() for NFS xlator so that volume set/reset wont restart the NFS server process. But few options can not be reconfigured dynamically e.g. nfs.mem-factor, nfs.port etc which needs NFS to be restarted. Change-Id: Ic586fd55b7933c0a3175708d8c41ed0475d74a1c BUG: 1027409 Signed-off-by: Santosh Kumar Pradhan <spradhan@redhat.com> Reviewed-on: http://review.gluster.org/6236 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Reviewed-by: Anand Avati <avati@redhat.com>
* Transparent data encryption and metadata authenticationEdward Shishkin2013-11-132-11/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | .. in the systems with non-trusted server This new functionality can be useful in various cloud technologies. It is implemented via a special encryption/crypt translator,which works on the client side and performs encryption and authentication; 1. Class of supported algorithms The crypt translator can support any atomic symmetric block cipher algorithms (which require to pad plain/cipher text before performing encryption/decryption transform (see glossary in atom.c for definitions). In particular, it can support algorithms with the EOF issue (which require to pad the end of file by extra-data). Crypt translator performs translations user -> (offset, size) -> (aligned-offset, padded-size) ->server (and backward), and resolves individual FOPs (write(), truncate(), etc) to read-modify-write sequences. A volume can contain files encrypted by different algorithms of the mentioned class. To change some option value just reconfigure the volume. Currently only one algorithm is supported: AES_XTS. Example of algorithms, which can not be supported by the crypt translator: 1. Asymmetric block cipher algorithms, which inflate data, e.g. RSA; 2. Symmetric block cipher algorithms with inline MACs for data authentication. 2. Implementation notes. a) Atomic algorithms Since any process in a stackable file system manipulates with local data (which can be obsoleted by local data of another process), any atomic cipher algorithm without proper support can lead to non-POSIX behavior. To resolve the "collisions" we introduce locks: before performing FOP->read(), FOP->write(), etc. the process should first lock the file. b) Algorithms with EOF issue Such algorithms require to pad the end of file with some extra-data. Without proper support this will result in losing information about real file size. Keeping a track of real file size is a responsibility of the crypt translator. A special extended attribute with the name "trusted.glusterfs.crypt.att.size" is used for this purpose. All files contained in bricks of encrypted volume do have "padded" sizes. 3. Non-trusted servers and Metadata authentication We assume that server, where user's data is stored on is non-trusted. It means that the server can be subjected to various attacks directed to reveal user's encrypted personal data. We provide protection against such attacks. Every encrypted file has specific private attributes (cipher algorithm id, atom size, etc), which are packed to a string (so-called "format string") and stored as a special extended attribute with the name "trusted.glusterfs.crypt.att.cfmt". We protect the string from tampering. This protection is mandatory, hardcoded and is always on. Without such protection various attacks (based on extending the scope of per-file secret keys) are possible. Our authentication method has been developed in tight collaboration with Red Hat security team and is implemented as "metadata loader of version 1" (see file metadata.c). This method is NIST-compliant and is based on checking 8-byte per-hardlink MACs created(updated) by FOP->create(), FOP->link(), FOP->unlink(), FOP->rename() by the following unique entities: . file (hardlink) name; . verified file's object id (gfid). Every time, before manipulating with a file, we check it's MACs at FOP->open() time. Some FOPs don't require a file to be opened (e.g. FOP->truncate()). In such cases the crypt translator opens the file mandatory. 4. Generating keys Unique per-file keys are derived by NIST-compliant methods from the a) parent key; b) unique verified object-id of the file (gfid); Per-volume master key, provided by user at mount time is in the root of this "tree of keys". Those keys are used to: 1) encrypt/decrypt file data; 2) encrypt/decrypt file metadata; 3) create per-file and per-link MACs for metadata authentication. 5. Instructions Getting started with crypt translator Example: 1) Create a volume "myvol" and enable encryption: # gluster volume create myvol pepelac:/vols/xvol # gluster volume set myvol encryption on 2) Set location (absolute pathname) of your master key: # gluster volume set myvol encryption.master-key /home/me/mykey 3) Set other options to override default options, if needed. Start the volume. 4) On the client side make sure that the file /home/me/mykey exists and contains proper per-volume master key (that is 256-bit AES key). This key has to be in hex form, i.e. should be represented by 64 symbols from the set {'0', ..., '9', 'a', ..., 'f'}. The key should start at the beginning of the file. All symbols at offsets >= 64 are ignored. 5) Mount the volume "myvol" on the client side: # glusterfs --volfile-server=pepelac --volfile-id=myvol /mnt After successful mount the file which contains master key may be removed. NOTE: Keeping the master key between mount sessions is in user's competence. ********************************************************************** WARNING! Losing the master key will make content of all regular files inaccessible. Mount with improper master key allows to access content of directories: file names are not encrypted. ********************************************************************** 6. Options of crypt translator 1) "master-key": specifies location (absolute pathname) of the file which contains per-volume master key. There is no default location for master key. 2) "data-key-size": specifies size of per-file key for data encryption Possible values: . "256" default value . "512" 3) "block-size": specifies atom size. Possible values: . "512" . "1024" . "2048" . "4096" default value; 7. Test cases Any workload, which involves the following file operations: ->create(); ->open(); ->readv(); ->writev(); ->truncate(); ->ftruncate(); ->link(); ->unlink(); ->rename(); ->readdirp(). 8. TODOs: 1) Currently size of IOs issued by crypt translator is restricted by block_size (4K by default). We can use larger IOs to improve performance. Change-Id: I2601fe95c5c4dc5b22308a53d0cbdc071d5e5cee BUG: 1030058 Signed-off-by: Edward Shishkin <edward@redhat.com> Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4667 Tested-by: Gluster Build System <jenkins@build.gluster.com>
* bd: Add support to create clone, snapshot and merge of LV images.M. Mohan Kumar2013-11-134-10/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | Special xattr names "clone" & "snapshot" can be used to create full and linked clone of the LV images. GFID of destination posix file (to be mapped) is passed as a value to the xattr. Destination posix file must exist before running this operation. These operations form a basis for offloading storage related operations from QEMU to GlusterFS. Syntax for full clone: xattr name: "clone" value: "gfid-of-dest-file" Syntax for linked clone: xattr name: "snapshot" value: "gfid-of-dest-file" Syntax for merging: xattr name: "merge" value: "path-to-snapshot-file" Example: setfattr -n clone -v <gfid-of-dest-file> /media/source setfattr -n snapshot -v <gfid-of-dest-file> /media/source setfattr -n merge -v "/media/sn" /media/sn Change-Id: Id9f984a709d4c2e52a64ae75bb12a8ecb01f8776 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5626 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* bd: Add aio support to BD xlatorM. Mohan Kumar2013-11-131-0/+4
| | | | | | | | | | | | Volume option bd-aio controls AIO feature for BD xlator. Code taken from posix-aio.c Change-Id: Ib049bd59c9d3f9101d33939838322cfa808de053 BUG: 1028672 Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com> Reviewed-on: http://review.gluster.org/5748 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>