summaryrefslogtreecommitdiffstats
path: root/libglusterfs
Commit message (Collapse)AuthorAgeFilesLines
* store: Add (un)locking functionalityNiels de Vos2013-08-022-4/+73
| | | | | | | | | | | | | | | | | | | | | Some configuration/cache files (like the NFS rmtab) can be stored on a GlusterFS volume and be used by multiple storage servers. This requires suitable locking for the gf_store_handle_t structure. Introduce gf_store_lock() and gf_store_unlock() for this purpose. The gf_store_locked_local() function can be used to check if the gf_store_handle_t has been locked by the current process. This change also includes an unrelated correction where a FILE* was getting leaked. Krishnan Parthasarathi identified this while reviewing the new locking functionality. Change-Id: I431b7510801841d4bad64480b4bb99d87e2ad347 BUG: 904065 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/4677 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs/client_t client_t implementation, phase 1Kaleb S. KEITHLEY2013-07-297-52/+1292
| | | | | | | | | | | | | | | | | | | | | | | | Implementation of client_t The feature page for client_t is at http://www.gluster.org/community/documentation/index.php/Planning34/client_t In addition to adding libglusterfs/client_t.[ch] it also extracts/moves the locktable functionality from xlators/protocol/server to libglusterfs, where it is used; thus it may now be shared by other xlators too. This patch is large as it is. Hooking up the state dump is left to do in phase 2 of this patch set. (N.B. this change/patch-set supercedes previous change 3689, which was corrupted during a rebase. That change will be abandoned.) BUG: 849630 Change-Id: I1433743190630a6d8119a72b81439c0c4c990340 Signed-off-by: Kaleb S. KEITHLEY <kkeithle@redhat.com> Reviewed-on: http://review.gluster.org/3957 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* libglusterfs: Fix compilation warningPranith Kumar K2013-07-291-1/+1
| | | | | | | | | Change-Id: Ibba7a6fd3119c85c78cb12628d85c7f9210e6b8c BUG: 928648 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5412 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Handle REPLICATE_TRASH_DIR from old bricksPranith Kumar K2013-07-263-0/+13
| | | | | | | | | Change-Id: Ib99f79d3fa607c818dbc62006516480f598d8add BUG: 886998 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4640 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: increase the auxillary group limit to 65536Anand Avati2013-07-244-5/+50
| | | | | | | | | | | | | Make the allocation of groups dynamic and increase the limit to 65536. Change-Id: I702364ff460e3a982e44ccbcb3e337cac9c2df51 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/5111 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* features/changelog: changelog translatorAvra Sengupta2013-07-223-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is the initial version of the Changelog Translator. What is it ----------- Goal is to capture changes performed on a GlusterFS volume. The translator needs to be loaded on the server (bricks) and captures changes in a plain text file inside a configured directory path (controlled by "changelog-dir", should be somewhere in <export>/.glusterfs/changelog by default). Changes are classified into 3 types: - Data: : TYPE-I - Metadata : TYPE-II - Entry : TYPE-III Changelog file is rolled over after a certain time interval (defauls to 60 seconds) after which a changelog is started. The thing to be noted here is that for a time interval (time slice) multiple changes for an inode are recorded only once (ie. say for 100+ writes on an inode that happens within the time slice has only a single corresponding entry in the changelog file). That way we do not bloat up the changelog and also save lots of writes. Changelog Format ----------------- TYPE-I and TYPE-II changes have the gfid on the entity on which the operation happened. TYPE-III being a entry op requires the parent gfid and the basename. Changelog format has been kept to a minimal and it's upto the consumers to do the heavy loading of figuring out deletes, renames etc.. A single changelog file records all three types of changes, with each change starting with an identifier ("D": DATA, "M": METADATA and "E": ENTRY). Option is provided for the encoding type (See TUNABLES). Consumers ---------- The only consumer as of today would be geo-replication, although backup utilities, self-heal, bit-rot detection could be possible consumers in the future. CLI ---- By default, change-logging is disabled (the translator is present in the server graph but does nothing). When enabled (via cli) each brick starts to log the changes. There are a set of tunable that can be used to change the translators behaviour: - enable/disable changelog (disabled by default) gluster volume set <volume> changelog {on|off} - set the logging directory (<brick>/.glusterfs/changelogs is the default) gluster volume set <volume> changelog-dir /path/to/dir - select encoding type (binary (default) or ascii) gluster volume set <volume> encoding {binary|ascii} - change the rollover time for the logs (60 secs by default) gluster volume set <volume> rollover-time <secs> - when secs > 0, changelog file is not open()'d with O_SYNC flag - and fsync is trigerred periodically every <secs> seconds. gluster volume set <volume> fsync-interval <secs> features/changelog: changelog consumer library (libgfchangelog) A shared library is provided for the consumer of the changelogs for easy acess via APIs. Application can link against this library and request for changelog updates. Conversion of binary logs to human-readable ascii format is also taken care by the library which keeps a copy of the changelog in application provided working directory. Change-Id: I75575fb7f1c53d2bec3dba1a329ea7bb3c628497 BUG: 847839 Original Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5127 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* log: send current gf_log to syslog conditionallyBala.FA2013-07-192-19/+105
| | | | | | | | | | | | | | | | When compile time option GF_USE_SYSLOG is enabled (which is default), generated logs are sent to syslog with error code ERR_DEV. User can opt to use traditional log at run time by creating /var/log/glusterd/logger.conf file and restarting respective gluster services. Change-Id: I9837d0f99da1afc2189d7ecd214c4293ec53715a BUG: 928648 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/5002 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* log: enhance syslog logging using CEE formatBala.FA2013-07-191-0/+211
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch enables to use syslog as log target in addition to the default. The logs are sent in CEE format (http://cee.mitre.org/). This logging can be disabled using compile time option by ./configure --disable-syslog (or) rpmbuild glusterfs.tar.gz --without syslog The framework provides two api void gf_openlog (const char *ident, int option, int facility); void gf_syslog (int error_code, int facility_priority, char *format, ...); consumers need to call gf_openlog() prior to gf_syslog() like the way traditional syslog function calls. error_code is mandatory when using gf_syslog(). For example, gf_openlog (NULL, -1, -1); gf_syslog (GF_ERR_DEV, LOG_ERR, "error reading configuration file"); Using syslog, admin is free to configure logger to * reduce repeated log messages * forward logs to remote logger * execute a command on certain log pattern * alert people for certain log pattern by email, snmp etc * and many more Change-Id: Ibacbcbbc547192893fc4a46b387496b622e4811f BUG: 928648 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/4915 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* log: error code generation supportBala.FA2013-07-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | error code and message are generated at compile time by reading a json file which contains information of elements for each error code. This framework provides error handling and ability to do more cleaner log messages to users. error-codes.json file contains error description is below format { "ERR_NAME": {"code": ERR_NUM, "message": {"LOCALE": "ERR_MESSAGE"}} } At compile time autogen.sh calls gen-headers.py which produces C header file libglusterfs/src/gf-error-codes.h. This header has a function const char *_gf_get_message (int code); which returns respective ERR_MESSAGE for given ERR_NUM. Change-Id: Ieefbf4c470e19a0175c28942e56cec98a3c94ff0 BUG: 928648 Signed-off-by: Bala.FA <barumuga@redhat.com> Reviewed-on: http://review.gluster.org/4977 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* fuse: auxiliary gfid mount supportRaghavendra G2013-07-193-12/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * files can be accessed directly through their gfid and not just through their paths. For eg., if the gfid of a file is f3142503-c75e-45b1-b92a-463cf4c01f99, that file can be accessed using <gluster-mount>/.gfid/f3142503-c75e-45b1-b92a-463cf4c01f99 .gfid is a virtual directory used to seperate out the namespace for accessing files through gfid. This way, we do not conflict with filenames which can be qualified as uuids. * A new file/directory/symlink can be created with a pre-specified gfid. A setxattr done on parent directory with fuse_auxgfid_newfile_args_t initialized with appropriate fields as value to key "glusterfs.gfid.newfile" results in the entry <parent>/bname whose gfid is set to args.gfid. The contents of the structure should be in network byte order. struct auxfuse_symlink_in { char linkpath[]; /* linkpath is a null terminated string */ } __attribute__ ((__packed__)); struct auxfuse_mknod_in { unsigned int mode; unsigned int rdev; unsigned int umask; } __attribute__ ((__packed__)); struct auxfuse_mkdir_in { unsigned int mode; unsigned int umask; } __attribute__ ((__packed__)); typedef struct { unsigned int uid; unsigned int gid; char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid string * in canonical form. */ unsigned int st_mode; char bname[]; /* bname is a null terminated string */ union { struct auxfuse_mkdir_in mkdir; struct auxfuse_mknod_in mknod; struct auxfuse_symlink_in symlink; } __attribute__ ((__packed__)) args; } __attribute__ ((__packed__)) fuse_auxgfid_newfile_args_t; An initial consumer of this feature would be geo-replication to create files on slave mount with same gfids as that on master. It will also help gsyncd to access files directly through their gfids. gsyncd in its newer version will be consuming a changelog (of master) containing operations on gfids and sync corresponding files to slave. * Also, bring in support to heal gfids with a specific value. fuse-bridge sends across a gfid during a lookup, which storage translators assign to an inode (file/directory etc) if there is no gfid associated it. This patch brings in support to specify that gfid value from an application, instead of relying on random gfid generated by fuse-bridge. gfids can be healed through setxattr interface. setxattr should be done on parent directory. The key used is "glusterfs.gfid.heal" and the value should be the following structure whose contents should be in network byte order. typedef struct { char gfid[UUID_CANONICAL_FORM_LEN + 1]; /* a null terminated gfid * string in canonical form */ char bname[]; /* a null terminated basename */ } __attribute__((__packed__)) fuse_auxgfid_heal_args_t; This feature can be used for upgrading older geo-rep setups where gfids of files are different on master and slave to newer setups where they should be same. One can delete gfids on slave using setxattr -x and .glusterfs and issue stat on all the files with gfids from master. Thanks to "Amar Tumballi" <amarts@redhat.com> and "Csaba Henk" <csaba@redhat.com> for their inputs. Signed-off-by: Raghavendra G <rgowdapp@redhat.com> Change-Id: Ie8ddc0fb3732732315c7ec49eab850c16d905e4e BUG: 952029 Reviewed-on: http://review.gluster.com/#/c/4702 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Amar Tumballi <amarts@redhat.com> Reviewed-on: http://review.gluster.org/4702 Reviewed-by: Xavier Hernandez <xhernandez@datalab.es> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* core: avoid infinite mutual recursion between THIS and gf_logJeff Darcy2013-07-181-2/+0
| | | | | | | | | | | | | | | | | This caused a stack overflow when (for some unknown reason) pthread_setspecific was failing. Therefore __glusterfs_this_location called gf_log which used THIS which wraps __glusterfs_this_location which . . . you get the idea. We have to break the loop somewhere, and we can't reasonably make _gf_log stop using THIS, so we make __glusterfs_this_location stop using _gf_log. Change-Id: I79c3ea40dd7980bb8ac76a52cdbf5c057b2e1c3c Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/5341 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra G <rgowdapp@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* include <limits.h> for PATH_MAXEmmanuel Dreyfus2013-07-181-0/+1
| | | | | | | | | | | | | | I need to include <limits.h> in order to use PATH_MAX, Otherwise it will not build at mine. I believe it is standard compliant to do so: http://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html BUG: 764655 Change-Id: I3f124466f7f7742e94a9d1256bc9239ec16aab04 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/5340 Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* locks: Added an xdata-based 'cmd' for inodelk count in a given domainshishir gowda2013-07-181-0/+1
| | | | | | | | | | | | | | | | | | | | Following is the semantics of the 'cmd': 1) If @domain is NULL - returns no. of locks blocked/granted in all domains 2) If @domain is non-NULL- returns no. of locks blocked/granted in that domain 3) If @domain is non-existent - returns '0'; This is important since locks xlator creates a domain in a lazy manner. where @domain - a string representing the domain. Change-Id: I5e609772343acc157ca650300618c1161efbe72d BUG: 951195 Original-author: Krishnan Parthasarathi <kparthas@redhat.com> Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4889 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Amar Tumballi <amarts@redhat.com>
* cluster/*: get logic to calculate min() of the 'stime' xattrAvra Sengupta2013-07-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | * in both distribute and replicate (ignoring stripe for now), add logic to calculate the min() of stime values. * What is a 'stime' ? Why is this required: - stime means 'slave xtime', mainly used to keep track of slave node's sync status when distributed geo-replication is used. Logic of calculating 'min()' for this stime is very important as in case of crashes/reboots/shutdown, we will have to 'restart' with crawling from stime time value from the mount point, which gives the 'min()' of all the bricks, which means, we don't miss syncing any files in the above cases. Change-Id: I2be8d434326572be9d4986db665570a6181db1ee BUG: 847839 Original Author: Amar Tumballi <amarts@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/4893 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mount/fuse: Provide option to use/not use kernel-readdirpPranith Kumar K2013-07-121-0/+1
| | | | | | | | | | | | | | By default fuse kernel readdirp usage in fuse xlator is off. When mount option use-readdirp=yes is provided it starts using fuse-kernel's readdirp. Change-Id: Id37edc53b1adc1638186d956c2f74c1e4e48aa59 BUG: 983477 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5322 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* mount/fuse: expose 'glusterfs.gfid*' virtual xattr keyAvra Sengupta2013-07-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | currently two keys are exposed: 'glusterfs.gfid' : output is 16byte binary gfid 'glusterfs.gfid.string' : output is 36 byte canonical format of gfid e.g. [root@supernova glusterfs]# getfattr -n glusterfs.gfid -e hex f0 glusterfs.gfid=0x68305acb73e541719804fcf36a4857e8 [root@supernova glusterfs]# getfattr -n glusterfs.gfid.string f0 glusterfs.gfid.string="68305acb-73e5-4171-9804-fcf36a4857e8" early consumers for this key would be geo-replication (as it has being designed to do namespace operations on gfid from the mount point, thereby needing the GFID for entry operations on the slave). Change-Id: I10b23dbd11628566ad6924334253f5d85d01a519 BUG: 847839 Original Author: Venky Shankar <vshankar@redhat.com> Signed-off-by: Avra Sengupta <asengupt@redhat.com> Reviewed-on: http://review.gluster.org/5129 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Bump GD_MAX_OP_VERSIONKaushal M2013-07-051-2/+3
| | | | | | | | | Change-Id: I7a212435ce7e98fe01aad2c0d1f698de8ea84235 BUG: 981278 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5287 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd/common-utils: move hostname helper functions to common-utilsKrishnan Parthasarathi2013-07-042-0/+234
| | | | | | | | | Change-Id: If47e209cb61ea0eb74ee2d6ef9e9342b2d6ee13a BUG: 980838 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5261 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* store: Fix resource leaks in gf_store_iter_* functionsKrishnan Parthasarathi2013-07-022-64/+55
| | | | | | | | | | | | Also, removed (redundant) member fd from gf_store_iter_t Change-Id: I40f0469997f77fa2f578a5495ca4ce285f1a59f2 BUG: 904065 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5243 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* libglusterfs: Fix valid_host_name for consecutive dotsKaushal M2013-07-021-0/+8
| | | | | | | | | | | | | The valid_host_name() function must reject addresses with consecutive dots in them. Change-Id: I1749de80c66e8fbad63b2e014f79e0203906030e BUG: 977246 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/5249 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* rpc: duplicate request cache for nfsRajesh Amaravathi2013-06-214-1/+63
| | | | | | | | | | | | | | | Duplicate request cache provides a mechanism for detecting duplicate rpc requests from clients. DRC caches replies and on duplicate requests, sends the cached reply instead of re-processing the request. Change-Id: I3d62a6c4aa86c92bf61f1038ca62a1a46bf1c303 BUG: 847624 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4049 Reviewed-by: Rajesh Joseph <rjoseph@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* store: move glusterd_store functions from mgmt/glusterd to libglusterfsNiels de Vos2013-06-204-3/+748
| | | | | | | | | | | | | Making the glusterd_store_* functions re-usable will help with future changes that need to read/write lists of items. BUG: 904065 Change-Id: I99fb8eced76d12d5a254567eccff9790b43d8da3 Signed-off-by: Niels de Vos <ndevos@redhat.com> Reviewed-on: http://review.gluster.org/4676 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* cluster/afr: Perform delayed changelog wakeups for anon fdPranith Kumar K2013-06-202-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | Problem: Nfs xlator never does open on a file for performing writes, afr does not perform changelog wakeup for this fd so operations which do metadata operations as soon as the data operations are completed perceive a delay of 'post-op-delay-secs'. Fix: Perform changelog wakeup on anon-fd if the fd with same pid is not present in inode-list. Note: This approach is a short-term fix. A proper fix needs a new domain for taking metadata locks so that data/metadata locks don't compete with each other. Change-Id: I253afb289eadf30c7951e56fb2c4840d7132f5e4 BUG: 966018 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5066 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* libglusterfs: Fix circular buffer to dump entries if count is less than 1024Venkatesh Somyajulu2013-06-131-6/+20
| | | | | | | | | | | | | | | | | | | | Problem: To dump the values present in the circular buffer, index always moves from current index to used_len. But if circular buffer is not completely filled even once then next index to be filled and used length value are always same which means it will never dump any value. Fix: Modified the logic of buffer traversing to dump values so that it will still maintain FIFO and cover both the cases where buffer is either partially filled or being used more than once. Change-Id: If73a5e481cca1751d57aba1136c2d25d23ce073c BUG: 972459 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/5197 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com>
* glusterfs: discard (hole punch) supportBrian Foster2013-06-1310-0/+173
| | | | | | | | | | | | | | | | Add support for the DISCARD file operation. Discard punches a hole in a file in the provided range. Block de-allocation is implemented via fallocate() (as requested via fuse and passed on to the brick fs) but a separate fop is created within gluster to emphasize the fact that discard changes file data (the discarded region is replaced with zeroes) and must invalidate caches where appropriate. BUG: 963678 Change-Id: I34633a0bfff2187afeab4292a15f3cc9adf261af Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/5090 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* gluster: add fallocate fop supportBrian Foster2013-06-1313-0/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement support for the fallocate file operation. fallocate allocates blocks for a particular inode such that future writes to the associated region of the file are guaranteed not to fail with ENOSPC. This patch adds fallocate support to the following areas: - libglusterfs - mount/fuse - io-stats - performance/md-cache,open-behind - quota - cluster/afr,dht,stripe - rpc/xdr - protocol/client,server - io-threads - marker - storage/posix - libgfapi BUG: 949242 Change-Id: Ice8e61351f9d6115c5df68768bc844abbf0ce8bd Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-on: http://review.gluster.org/4969 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Avoid order mismatch in blocking entrylksPranith Kumar K2013-06-052-3/+14
| | | | | | | | | | | | | | | | | | | | | | | Problem: When taking blocking entrylks, afr orders the entrylks based on uuid_compare of gfids of parent dirs, if they are equal then it orders them based on the basenames. While this approach works fine, the implementation assumes loc->gfids to be populated at the time of the comparison, but loc may have gfid in loc->inode->gfid instead of loc->gfid which was leading to order mismatches and dead-locks. Fix: Implemented loc_gfid which gives gfid by checking both loc->gfid, loc->inode->gfid. Used this for ordering the blocking entrylks. Change-Id: Ib0db36bbaf0df09fa87c3c3bb6a834db74fc2154 BUG: 965987 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/5062 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* glusterd-volgen: Improve volume op-versions calculationKaushal M2013-05-313-26/+9
| | | | | | | | | | | | | | | | | | Volume op-versions calculations now take into account if an option, a. enables/disables an xlator, or b. is a boolean option. This prevents op-versions from being updated when a feature is disabled. Also, correctly close the dynamically loaded xlators in xlator_volopt_dynload() and prevent leaks. Change-Id: I895ddeeec6f6a33e509325f0ce6f01b7aad3cf5c BUG: 954256 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4952 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* statedump: Print entries in cb buffer in FIFO ordershishir gowda2013-05-311-6/+5
| | | | | | | | | | | | | | | | | Currently cb buffer was being printed in LIFO order, which is was against the percieved notion of logs having older entries printed before newer entries in the state dumps. Re-did the loop to prevent crash as when w_index == 0, we would access cb[w_index - 1]. Change-Id: Idd085f73fa6937e506a2a1925e42fbcfd2d9bb1c BUG: 966847 Signed-off-by: shishir gowda <sgowda@redhat.com> Reviewed-on: http://review.gluster.org/4968 Reviewed-by: Santosh Pradhan <spradhan@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* logging: Fix to avoid excessive logging.Venkatesh Somyajulu2013-05-281-2/+0
| | | | | | | | | | | | | | | | | | | | | mem_get function: Log message related to mem_pool calloc is removed as its been calculated in mempool 'stats'. This messgae is consuming nearly half of the total log messages in DEBUG mode. dht_hash_compute function: Changed log level from DEBUG to TRACE. client_fdctx_destroy function: Changed log level from DEBUG to TRACE. Change-Id: Ic948db0419e76df4e95ebd0cabaf66eadbaada6b BUG: 966851 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/5086 Reviewed-by: Amar Tumballi <amarts@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* dht,posix: support for case discoveryAnand Avati2013-05-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is support for discovering a filename in a given directory which has a case insensitive match of a given name. It is implemented as a virtual extended attribute on the directory where the required filename is specified in the key. E.g: sh# getfattr -e "text" -n user.glusterfs.get_real_filename:FiLe-B /mnt/samba/patchy getfattr: Removing leading '/' from absolute path names # file: mnt/samba/patchy user.glusterfs.get_real_filename:FiLe-B="file-b" In reality, there can be multiple "answers" as the backend filesystem is case sensitive and there can be multiple files which can strcasecamp() successfully. In this case we pick the first matched file from the first responding server. If a matching file does not exist, we return ENOENT (and NOT ENODATA). This way the caller can differentiate between "unsupported" glusterfs API and file not existing. This API is used by Samba VFS to perform efficient discovery of the real filename without doing a full scan at the Samba level. Change-Id: I53054c4067cba69e585fd0bbce004495bc6e39e8 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4941 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* gfapi: link inodes in relevant entry FOPsAnand Avati2013-05-252-8/+33
| | | | | | | | | | | | Do not let inode linking to happen only in lookup(). While that works, it is inefficient. Change-Id: I51bbfb6255ec4324ab17ff00566375f49d120c06 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4931 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* syncop: copy inode pointer in readdirplusAnand Avati2013-05-251-0/+2
| | | | | | | | | Change-Id: I9ab2b8ac2da9fe13f56b8b08f715a0b603ece0cb BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4930 Reviewed-by: Vijay Bellur <vbellur@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* syncop: synctask shouldn't yawn, it could miss a 'wake'Krishnan Parthasarathi2013-05-212-26/+6
| | | | | | | | | Change-Id: I7731fd33ca0c925cc52f8d105275b44fc625a1e2 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5058 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Remove task from syncbarrier's waitq before 'wake'Krishnan Parthasarathi2013-05-201-7/+5
| | | | | | | | | | | | | | | | Removing task from syncbarrier's waitq after wake could result in a subsequent syncbarrier_wake, wake'ing up the already running task. This fix makes the removal from waitq and wake 'atomic' The root cause and the fix are similar in spirit to what was observed in synclock's waitq implementation. Change-Id: I7dd9e6ad5945742bcda20eb5a06a9376bb18528e BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/5047 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Update synctask state appropriatelyKrishnan Parthasarathi2013-05-202-2/+5
| | | | | | | | | | | | | | | | | | | | * Earlier, SYNCOP macro, the only consumer of synctask_yield, would set the task->state to SYNCTASK_SUSPEND. Today, we have glusterd having its own wrapper macros which don't set task's state. There is also the syncbarrier and synclock framework, which also participate in a synctask's scheduling (and need to keep a task's state up to date). It only makes more sense to leave a synctask's state to the synctask library, since its an internal affair. * Need to 'yawn' before 'yield' to avoid re-running tasks to set task->woken appropriately. Change-Id: Ic7a59e6ebcc46f03e53223ca237668d45a3cba40 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4985 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* syncop: Remove task from synclock's waitq before 'wake'Krishnan Parthasarathi2013-05-151-6/+6
| | | | | | | | | | | | | Removing task from synclock's waitq after wake could result in a subsequent unlock, wake'ing up the already running task. This fix makes the removal from waitq and wake 'atomic'. Change-Id: Ie51ccf9d38f2cee84471097644aab95496f488b1 BUG: 948686 Signed-off-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-on: http://review.gluster.org/4987 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* synctask: implement barriers around yield, not the other wayAnand Avati2013-05-042-35/+147
| | | | | | | | | | | | | | | | | | | | In the current implementation, barriers are in the core of the syncprocessors. Wake()s are treated as syncbarrier wake. This is however delicate, as spurious wake()s of the synctask can mess up the accounting of the barrier and waking it prematurely. The fix is to keep yield() and wake() as the basic primitives, and implement barriers as an object impelemented on top of these primitives. This way, only an explicit barrier_wake() gets counted towards the barrier accounting, and spurious wakes will be truly safe. Change-Id: I8087f0f446113e5b2d0853431c0354335ccda076 BUG: 948686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4921 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com>
* Fix uninitialized mutex usage in synctask_destroyEmmanuel Dreyfus2013-04-301-3/+4
| | | | | | | | | | | | | | | synctask_new() initialize task->mutex is task->synccbk is NULL. synctask_done() calls synctask_destroy() if task->synccbk is not NULL. synctask_destroy() always destroys the mutex. Fix that by checking for task->synccbk in synctask_destroy() BUG: 764655 Change-Id: I50bb53bc6e2738dc0aa830adc4c1ea37b24ee2a0 Signed-off-by: Emmanuel Dreyfus <manu@netbsd.org> Reviewed-on: http://review.gluster.org/4913 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* cluster/afr: Avoid self-healing extended attribute used by SELinux.Vijay Bellur2013-04-301-0/+2
| | | | | | | | | | | | | | | | | | Since removexattr() fails to remove "security.selinux" in a system where SELinux is enforcing, xattr self-healing fails. As a consequence of this, user extended attributes are not being healed. Added a check in afr to prune SELinux xattr from the dictionary used for removing xattrs from the sink. Minor changes in tests and md-cache as well. Signed-off-by: Vijay Bellur <vbellur@redhat.com> Change-Id: I854bfc0098dde812ce2afe64b125ee40c04bdeb1 BUG: 957877 Reviewed-on: http://review.gluster.org/4905 Reviewed-by: Venky Shankar <vshankar@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* glusterd: Introduce volume op-versionsKaushal M2013-04-261-1/+0
| | | | | | | | | | | | | | | | | | | | | | | Each volume is now associated with two op-versions, * op_version - the op-version of the highest op-versioned feature enabled * client_op_version - the op-version of the highest op-versioned feature enabled which affects the clients only. These two op-versions are generated dynamically and kept updated during runtime. Glusterd now uses the respective volumes' client-op-version during getspec requests. To achieve the above a new field in the vme table is introduced, client_option, this boolean field tells if the option is a client side option. Change-Id: I12c83b1dd29ab506026efd50d448cebbcee53c27 BUG: 907311 Signed-off-by: Kaushal M <kaushal@redhat.com> Reviewed-on: http://review.gluster.org/4584 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* nfs: change default nfs port to 2049Rajesh Amaravathi2013-04-241-1/+1
| | | | | | | | | | | | | This change makes it possible to mount glusterfs volumes without specifying vers=3 option. Change-Id: If5a974e2bdfd2adbeac3d82af774310cdf30f988 BUG: 832939 Signed-off-by: Rajesh Amaravathi <rajesh@redhat.com> Reviewed-on: http://review.gluster.org/4840 Reviewed-by: Niels de Vos <ndevos@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Kaleb KEITHLEY <kkeithle@redhat.com>
* gfapi: POSIX locking supportAnand Avati2013-04-242-0/+37
| | | | | | | | | Change-Id: I37d9e1fb4a715094876be6af3856c1b4cf398021 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4881 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncop: set credentials of running process in @frameAnand Avati2013-04-241-1/+18
| | | | | | | | | | | | | | | Inherit the pid/euid/egid/groups of the running process in the frame. Do this only in cases where a loaded frame was not presented to the synctask. This behavior is required for Samba VFS. Change-Id: Ib181c90f47c6741197b9ce9f67a19e2914b647d2 BUG: 953694 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4878 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Vijay Bellur <vbellur@redhat.com>
* syncenv: be robust against spurious wake()sAnand Avati2013-04-091-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current implementation, when the callers of synctasks perform a spurious wake() of a sleeping synctask (i.e, an extra wake() soon after a wake() which already woke up a yielded synctask), there is now a possibility of two sync threacs picking up the same synctask. This can result in a crash. The fix is to change ->slept = 0|1 and membership of synctask in runqueue atomically. Today we dequeue a task from the runqueue in syncenv_task(), but reset ->slept = 0 much later in synctask_switchto() in an unlocked manner -- which is safe, when there are no spurious wake()s. However, this opens a race window where, if a second wake() happens after the dequeue, but before setting ->slept = 0, it results in queueing the same synctask in the runqueue once again, and get picked up by a different synctask. This is has been diagnosed to be the crashes in the regression tests of http://review.gluster.org/4784. However that patch still has a spurious wake() [the trigger for this bug] which is yet to be fixed. Change-Id: I9b4b9dd5115d6e62ba45162ae90dd5e917a4f83d BUG: 948686 Signed-off-by: Anand Avati <avati@redhat.com> Reviewed-on: http://review.gluster.org/4795 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* glusterfsd: Cleanup temporary files from /tmpVijay Bellur2013-04-081-0/+6
| | | | | | | | | | | | | | | | For each gluster{d,fs,fsd} start, one or more temporary file(s) created in /tmp were not being unlinked. This patch cleans that up. Modified a typo in an unrelated log message as well. Change-Id: I3dec2a2ca40c7d6828eb238ec9cd08b6072cf0dd BUG: 949327 Signed-off-by: Vijay Bellur <vbellur@redhat.com> Reviewed-on: http://review.gluster.org/4786 Reviewed-by: Krishnan Parthasarathi <kparthas@redhat.com> Reviewed-by: Raghavendra Bhat <raghavendra@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com>
* dict: Put "goto out" in dict_unserialize to avoid process crashVenkatesh Somyajulu2013-04-031-0/+1
| | | | | | | | | | | | | | | | Problem: In the dictionary serialization function, if the [(buf + vallen) > (orig_buf + size)], then memdup is getting failed. Fix: Put "goto out" whenever this condition is met. Change-Id: I662628a936596dbb47825aad47d7dbab2879eb07 BUG: 947824 Signed-off-by: Venkatesh Somyajulu <vsomyaju@redhat.com> Reviewed-on: http://review.gluster.org/4767 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* protocol/client: Print valid loc identifiersPranith Kumar K2013-04-032-1/+16
| | | | | | | | | | Change-Id: I45f91105862a2484b8906a7a63b98ab4aaf80d05 BUG: 924643 Signed-off-by: Pranith Kumar K <pkarampu@redhat.com> Reviewed-on: http://review.gluster.org/4683 Reviewed-by: Jeff Darcy <jdarcy@redhat.com> Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* dht: make nufa/switch call dht's init/finiJeff Darcy2013-04-032-14/+20
| | | | | | | | | | | | | These functions keep changing as new functionality is added, so copying and pasting the code is not a good solution. This way ensures that all fields get initialized properly no matter how much new stuff we throw in. Change-Id: I9e9b043d2d305d31e80cf5689465555b70312756 BUG: 924488 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4710 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>
* core: add dispatch table for init/finiJeff Darcy2013-04-032-10/+27
| | | | | | | | | | | | | This adds a layer of indirection so that derivative translators such as NUFA and switch can refer to the parent's init/fini (in both cases DHT's) without having to create stub functions. Change-Id: I1af1fea70a9ddd2aa20485af7ae65f9660f19dd6 BUG: 924490 Signed-off-by: Jeff Darcy <jdarcy@redhat.com> Reviewed-on: http://review.gluster.org/4709 Tested-by: Gluster Build System <jenkins@build.gluster.com> Reviewed-by: Anand Avati <avati@redhat.com>